CN107665248A - File classification method and device based on deep learning mixed model - Google Patents

File classification method and device based on deep learning mixed model Download PDF

Info

Publication number
CN107665248A
CN107665248A CN201710864498.XA CN201710864498A CN107665248A CN 107665248 A CN107665248 A CN 107665248A CN 201710864498 A CN201710864498 A CN 201710864498A CN 107665248 A CN107665248 A CN 107665248A
Authority
CN
China
Prior art keywords
deep learning
mixed model
text
mrow
learning mixed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710864498.XA
Other languages
Chinese (zh)
Inventor
杨振宇
庞雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201710864498.XA priority Critical patent/CN107665248A/en
Publication of CN107665248A publication Critical patent/CN107665248A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of file classification method and device based on deep learning mixed model, methods described includes:Text data is obtained, and the text data is pre-processed;Feature learning is carried out to text data based on the deep learning mixed model that noise reduction autocoder and depth confidence network are combined;The feature obtained according to study, is classified using Softmax regression models.The deep learning mixed model of the present invention has very strong adaptability, and the sorting technique disclosure satisfy that the classification demand of most of different texts.

Description

File classification method and device based on deep learning mixed model
Technical field
The invention belongs to depth network model field, more particularly to a kind of text classification based on deep learning mixed model Method and apparatus.
Background technology
With the continuous development in Information technology epoch, electronic text information quantity increases sharply, it is meant that the big data epoch Arriving.So in this context, how to carry out effective tissue to these substantial amounts of text messages becomes especially to weigh with utilizing Will.Technical foundation of the text classification as fields such as information retrieval, digital library, information filterings, there is very big application Prospect.Text representation is always the key problem of natural language processing field, and there is dimension disaster, number for traditional text representation According to it is sparse the problems such as, oneself through turn into a large amount of natural language processing mission performances improve bottleneck.
Text representation is always the key problem of natural language processing field, and there is dimension calamity for traditional text representation The problems such as hardly possible, Sparse, the obstacle that oneself improves through turning into a large amount of natural language processing mission performances.Deep learning model has The deep structure of multilayered nonlinear mapping, and data dimension can be effectively reduced by multilayer neural network come training pattern Number.Meanwhile deep learning can also utilize less parameter to complete complicated function approximation, good feature can be carried out to data Study.Most of all, deep neural network can be launched into BP neural network after training, BP nerve nets can be utilized The function of network reverse propagated error optimizes to the performance of whole network.But the type of neutral net, the number of plies, different god Combination through network etc., resulted in be currently based on neutral net disaggregated model it is various informative, for which model or The mixed model of which model of person possesses more excellent classification performance, or possesses stronger adaptability, in the prior art also not It is expressly recited.
Therefore, seek the sorting technique that a kind of performance is more preferable, universality is stronger, be still that those skilled in the art need at present The technical problem urgently to solve.
The content of the invention
To overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of text based on deep learning mixed model point Class method, the feature learning of text is carried out with reference to the noise reduction autocoder in deep learning model and depth confidence network, Classification is finally performed using Softmax regression models, the deep learning mixed model has very strong adaptability, the classification side Method disclosure satisfy that the classification demand of most of different texts.
To achieve the above object, the present invention adopts the following technical scheme that:
A kind of file classification method based on deep learning mixed model, comprises the following steps:
Step 1:Text data is obtained, and the text data is pre-processed;
Step 2:The deep learning mixed model being combined based on noise reduction autocoder and depth confidence network is to text Data carry out feature learning;
Step 3:The feature obtained according to study, is classified using Softmax regression models.
Further, the pretreatment includes:
(1) classifying text data are treated tentatively to be filtered;
(2) text data is segmented, and the further filtering text data on the basis of participle;
(3) character representation is carried out to text using VSM models.
Further, the preliminary filtering includes removing form, the punctuation mark in text;Further filtering includes removing Stop-word, and filtered according to part of speech, retain verb and noun.
Further, the deep learning mixed model that the noise reduction autocoder and depth confidence network are combined is by dropping Make an uproar autocoder and depth confidence cascade forms, wherein, the output of the noise reduction autocoder is as depth confidence The input of network.
Further, the noise reduction autocoder is arranged to two layers, and input data is mapped to a higher-dimension by first layer Data are compressed processing by input of the space as the second layer, the input using obtained data as depth confidence network, institute Depth confidence network is stated as five layers.
Further, the process of the noise reduction autocoder training is as follows:
First, the first layer coderDimensionality reduction is carried out to high dimensional data, input vector x is destroyed and obtainsBy activating letter Number and linear transformation, obtain implicit coding result y;
Wherein, SfIt is that non-linear activation primitive its expression formula is:
Then, implicit layer data y to being mapped as reconstructing z by the second layer decoder f (y);
Z=f (y)=sg(W'y+az)
Wherein, sgIt is the activation primitive of decoder, W'=WT, be sigmoid functions W transposition, ayAnd azBias to Amount;
Iteration is performed, parameter θ={ W, a for making reconstructed error minimum is found on training sample sety,az, according to following public affairs Formula updates W, ayAnd az
WhereinIt is learning rate.
Further, the expression formula of the reconstructed error is:
Wherein,N is training set sample number, XiRefer to i-th of input, ZiRefer to the data after i-th of decoding and reconstituting.
Further, the training process of the depth confidence network is as follows:
It is first according to successively greedy method and carries out pre-training, then network is entered using the function of BP error back propagations Row tuning;Wherein, during pre-training, the computational methods of weight are as follows:
To each record X in training set, X is attached to aobvious layer v0, calculate the probability that it makes hidden neuron be activated:Subscript in formula is used to distinguish different vectors, and subscript is used to distinguish same vectorial difference Dimension;Then, a sample h is extracted from the probability distribution calculated(0)~P (h(0)|v(0)), use h(0)The aobvious layer of reconstructEqually extract a sample v of aobvious layer(1)~P (v(1)|h(0)), then with after reconstruct Aobvious layer neuron calculates the probability that hidden neuron is activated,Weight is updated as the following formula:W ←W+λ(P(h(0)=1 | v(0))v(0)T-P(h(1)=1 | v(1))v(1)T)。
According to the second purpose of the invention, present invention also offers a kind of text classification dress based on deep learning mixed model Put, including memory, processor and storage are on a memory and the computer program that can run on a processor, the processor The text classification based on deep learning mixed model as described in claim any one of 1-8 is realized when performing described program.
According to the 3rd purpose of the invention, present invention also offers a kind of computer-readable recording medium, is stored thereon with meter Calculation machine program, when the program is executed by processor perform as described in claim any one of 1-8 based on deep learning hybrid guided mode The text classification of type.
Beneficial effects of the present invention
1st, the feature learning mixed model of a kind of new text classification proposed by the present invention, select noise reduction autocoder and Two kinds of models of depth confidence network are cascaded, and the input using the output of noise reduction autocoder as depth confidence network, are led to Cross test result indicates that, the deep learning mixed model has very strong adaptability, can meet the classification of most of different texts It is required that and model it is simple, improve text classification performance.
2nd, file classification method provided by the invention can be applied to the excavation of any text according to demand, practical, It is easy to spread, such as applied to Web page classifying, microblog emotional analysis, user comment mining analysis, information retrieval, digital publication Shop, information filtering etc..
Brief description of the drawings
The Figure of description for forming the part of the application is used for providing further understanding of the present application, and the application's shows Meaning property embodiment and its illustrate be used for explain the application, do not form the improper restriction to the application.
Fig. 1 is the implementation process figure of the file classification method of the invention based on deep learning mixed model;
Fig. 2 is influence of the different destructive rates to classification in noise reduction autocoder;
Fig. 3 is that influence of the node layer to classification results is hidden in depth confidence network;
Fig. 4 is influence of the different classifications algorithm to classification accuracy rate.
Embodiment
It is noted that described further below is all exemplary, it is intended to provides further instruction to the application.It is unless another Indicate, all technologies used herein and scientific terminology are with usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in this manual using term "comprising" and/or " bag Include " when, it indicates existing characteristics, step, operation, device, component and/or combinations thereof.
In the case where not conflicting, the feature in embodiment and embodiment in the application can be mutually combined.
Embodiment one
Present embodiment discloses a kind of file classification method based on deep learning mixed model, as shown in figure 1, including with Lower step:
Step 1:Text data to be sorted is obtained, and the text data is pre-processed;
The pretreatment specifically includes:
(1) classifying text data are treated tentatively to be filtered;
Specifically, some otiose information in text document are removed.Such as some forms, punctuation mark in document Deng.
(2) text data is segmented, and the further filtering text data on the basis of participle;
Text is segmented using NLPIR Chinese word segmentation systems herein.The stop-word in document is removed after participle.Use The ICTCLAS systems of the Chinese Academy of Sciences carry out part-of-speech tagging, remove the useless feature of such as auxiliary word, preposition and predictive ability, only extract Verb and noun are as Feature Words.
(3) character representation is carried out to text using VSM models.
Step 2:The deep learning mixed model being combined based on noise reduction autocoder and depth confidence network is to text Data carry out feature learning;
Specifically, character representation is input to one by noise reduction autocoder (DAE) and depth confidence network (DBN) phase With reference to deep learning mixed model (DABN) in carry out feature learning.
The feature learning stage detailed process is:
First by other representations of two layers DAE study initial characteristicses.First layer noise reduction autocoder will input number According to the space for being mapped to a higher-dimension, it is set to possess stronger separability;The input of second layer noise reduction autocoder is first Data are compressed processing by the output of layer.Then further feature extraction is carried out to text using 5 layers of DBN, by DAE's Input data of the output data as DBN.
The process of the noise reduction autocoder training is as follows:
EncoderDimensionality reduction operation is carried out to high dimensional data, input vector x is destroyed first and obtains
It is then enter into encoderBy activation primitive and the sequence of operations of linear transformation, finally give Implicit coding result Y.Decoder f (y) is expressed as such as minor function to implicit layer data is mapped back into reconstruct z:
Z=f (y)=sg(W'y+az)
Wherein SfIt is that non-linear activation primitive its expression formula is:sgIt is swashing for decoder Function living, herein using sigmoid functions W'=WT, be W transposition, ayAnd azIt is bias vector.
DAE training process is searching parameter θ={ W, a on training sample sety,azMinimum reconstructed error, reconstruct The expression formula of error is:Wherein, L is reconstruct error function.
It is using the closely related loss function of intersection, expression formula herein:Wherein, N is instruction Practice collection sample number, XiRefer to i-th of input, ZiRefer to the data after i-th of decoding and reconstituting.DAE is used in each iterative process Following formula updates weight matrix:WhereinIt is learning rate.ayUpdate mode be:az Update mode be:
The training process of the depth confidence network is as follows:
DBN training includes two processes of pre-training and tuning, first carries out pre-training according to successively greedy method:
1st, first have to train up first RBM;
2nd, first RBM offset and weight is fixed, the input quantity using its hidden layer as second RBM;
3rd, after having trained up second RBM, second RBM is stacked on to first RBM top;
4th, 1,2,3 steps are arbitrarily multiple more than repeating;
The 5th, if training set data has label, then, will also be with except aobvious layer neuron when training the RBM of first There is the neuron for representing tag along sort to be trained together.
The weighing computation method is as follows:
To each record X in training set, X is attached to aobvious layer v0, calculate the probability that it makes hidden neuron be activated:Subscript in formula is used to distinguish different vectors, and subscript is used to distinguish same vectorial difference Dimension.Then, a sample h is extracted from the probability distribution calculated(0)~P (h(0)|v(0)), use h(0)The aobvious layer of reconstructEqually extract a sample v of aobvious layer(1)~P (v(1)|h(0)), then with after reconstruct Aobvious layer neuron calculates the probability that hidden neuron is activated,Weight is updated as the following formula:W ←W+λ(P(h(0)=1 | v(0))v(0)T-P(h(1)=1 | v(1))v(1)T)。
We have trained according to successively greedy method to DBN, and the RBM of preceding layer error can the gradual RBM of layer backward Transmit, be not modified.We can carry out tuning using the function of BP error back propagations to network.Utilize DBN power The weights of value initialization BP neural network, make classifying quality more preferable.
Step 3:The feature obtained according to study, is classified using Softmax regression models.
Softmax regression models are that (logistic returns solution for extension of the logistic regression models in more classification problems Certainly be two classification problems).Input using the text feature that deep learning mixed model exports as Softmax regression models, To text classification.
Alternatively, this method also includes step 4:The classification performance is evaluated.Specifically, the performance of text classification Evaluation index is mainly recall rate, accuracy rate and comprehensive three kinds of accuracy.
Embodiment two
The purpose of the present embodiment is to provide a kind of computing device.
A kind of document sorting apparatus based on deep learning mixed model, including memory, processor and it is stored in storage Following steps are realized on device and the computer program that can run on a processor, during the computing device described program, including:
Step 1:Text data is obtained, and the text data is pre-processed;
Step 2:The deep learning mixed model being combined based on noise reduction autocoder and depth confidence network is to text Data carry out feature learning;
Step 3:The feature obtained according to study, is classified using Softmax regression models.
Embodiment three
The purpose of the present embodiment is to provide a kind of computer-readable recording medium.
A kind of computer-readable recording medium, is stored thereon with computer program, for based on deep learning mixed model Text classification, the program performs following steps when being executed by processor:
Step 1:Text data is obtained, and the text data is pre-processed;
Step 2:The deep learning mixed model being combined based on noise reduction autocoder and depth confidence network is to text Data carry out feature learning;
Step 3:The feature obtained according to study, is classified using Softmax regression models.
Each step being related in above example two and three is corresponding with embodiment of the method one, and embodiment can be found in The related description part of embodiment one.Term " computer-readable recording medium " is construed as including one or more instruction set Single medium or multiple media;Any medium is should also be understood as including, any medium can be stored, encodes or held Carry for the instruction set by computing device and make the either method in the computing device present invention.
Experimental result
The experimental data set chosen herein comes from Fudan University's corpus (Fudan University's computerized information and state of technology system Border database hub natural language processing group), provided by Fudan University Li Rong lands.Selection wherein 10 classifications, totally 4000 Document is tested, and each classification chooses 400 samples, and training corpus and testing material are according to 1:1 ratio divides.
1. DAE is respectively 2000-1000 two layers totally using nodes, iteration 2000 times, and chooses destructive rate and be respectively 0.1,0.14,0.18,0.22,0.26,0.3 classification experiments are carried out.First every layer of matrix is represented to carry out being randomized the behaviour set to 0 Make, then be trained, first layer training terminates the next layer of rear retraining.
2. the nodes of DBN neutral nets are respectively 5 layers of 600-500-300-100-15, BP tunings number is 300.
3. carry out svm classifier experiment using libsvm tool boxes;The knnclassify graders carried using matlab are entered Row KNN classification experiments.
The Performance Evaluating Indexes of text classification are mainly recall rate, accuracy rate and comprehensive three kinds of accuracy.It is assumed that:Classification ai Classification results in, the number of samples for being correctly divided into such is b, and it is c that mistake, which is incorporated into as such number of samples, by such mistake The number of samples incorporated into as its class is d, altogether comprising C classes.
Recall rate:Recall=b/ (b+c), measurement is recall ratio.
Accuracy rate:Precision=b/ (b+d), measurement is precision ratio.
Table 1 have recorded the contrast of different sorting algorithm classification results accuracys rate and recall rate:
Accompanying drawing 2 is influence of the different destructive rates to classification in noise reduction autocoder, it will be seen that DAE from figure Influence to classification be in class parabolic shape after introducing destructive rate, destructive rate for 0.08 and 0.24 when classification accuracy rate most It is small, while when destructive rate is 0.16, classification accuracy rate is maximum.
Accompanying drawing 3 is that first layer hides influence of the node layer to classification accuracy rate in depth confidence network.With DBN hidden layers Nodes are continuously increased, and its text classification recall rate and accuracy rate are constantly increasing at the beginning, but big in hidden layer unit number It is on a declining curve after 600.Hidden layer unit number is primarily due to all to be unfavorable for expressing data characteristics too much or very little.When When the interstitial content of hidden layer is 600, to being up to 91%, recall rate also reaches and is up to the rate of accuracy reached of text classification 89%.
Accompanying drawing 4 is influence of the different classifications algorithm to classification accuracy rate.From table 1 and Fig. 4 we can see that it is carried herein The classifying quality of the deep learning mixed model gone out is better than traditional sorting algorithm.
Test result indicates that above-mentioned embodiment is only the specific case of the present invention, scope of patent protection of the invention Including but not limited to above-mentioned embodiment, a kind of claims of the method for any text classification for meeting the present invention And the appropriate change or replacement that the those of ordinary skill of any technical field is done to it, it should all fall into patent of the invention Protection domain.
It will be understood by those skilled in the art that each module or each step of the invention described above can be filled with general computer Put to realize, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.The present invention be not restricted to any specific hardware and The combination of software.
Although above-mentioned the embodiment of the present invention is described with reference to accompanying drawing, model not is protected to the present invention The limitation enclosed, one of ordinary skill in the art should be understood that on the basis of technical scheme those skilled in the art are not Need to pay various modifications or deformation that creative work can make still within protection scope of the present invention.

Claims (10)

1. a kind of file classification method based on deep learning mixed model, it is characterised in that comprise the following steps:
Step 1:Text data is obtained, and the text data is pre-processed;
Step 2:The deep learning mixed model being combined based on noise reduction autocoder and depth confidence network is to text data Carry out feature learning;
Step 3:The feature obtained according to study, is classified using Softmax regression models.
2. the file classification method as claimed in claim 1 based on deep learning mixed model, it is characterised in that the pre- place Reason includes:
(1) classifying text data are treated tentatively to be filtered;
(2) text data is segmented, and the further filtering text data on the basis of participle;
(3) character representation is carried out to text using VSM models.
3. the file classification method as claimed in claim 2 based on deep learning mixed model, it is characterised in that described preliminary Filtering includes removing form, punctuation mark in text;Further filtering includes removing stop-word, and is carried out according to part of speech Filter, retain verb and noun.
4. the file classification method as claimed in claim 1 based on deep learning mixed model, it is characterised in that the noise reduction The deep learning mixed model that autocoder and depth confidence network are combined is by noise reduction autocoder and depth confidence net Network cascade forms, wherein, the input exported as depth confidence network of the noise reduction autocoder.
5. the file classification method as claimed in claim 4 based on deep learning mixed model, it is characterised in that the noise reduction Autocoder is arranged to two layers, and input data is mapped to a higher dimensional space as the input of the second layer, logarithm by first layer According to processing, the input using obtained data as depth confidence network is compressed, the depth confidence network is five layers.
6. the file classification method as claimed in claim 4 based on deep learning mixed model, it is characterised in that the noise reduction The process of autocoder training is as follows:
First, the first layer coderDimensionality reduction is carried out to high dimensional data, input vector x is destroyed and obtainsBy activation primitive with And linear transformation, obtain implicit coding result y;
<mrow> <mi>y</mi> <mo>=</mo> <mover> <mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> <mo>^</mo> </mover> <mo>=</mo> <msub> <mi>s</mi> <mi>f</mi> </msub> <mrow> <mo>(</mo> <mover> <mrow> <mi>W</mi> <mi>x</mi> </mrow> <mo>^</mo> </mover> <mo>+</mo> <msub> <mi>a</mi> <mi>y</mi> </msub> <mo>)</mo> </mrow> </mrow>
Wherein, SfIt is that non-linear activation primitive its expression formula is:
Then, implicit layer data y to being mapped as reconstructing z by the second layer decoder f (y);
Z=f (y)=sg(W'y+az)
Wherein, sgIt is the activation primitive of decoder, W'=WT, be sigmoid functions W transposition, ayAnd azIt is bias vector;
Iteration is performed, parameter θ={ W, a for making reconstructed error minimum is found on training sample sety,az, according to below equation more New W, ayAnd az
WhereinIt is learning rate.
7. the file classification method as claimed in claim 6 based on deep learning mixed model, it is characterised in that the reconstruct The expression formula of error is:
Wherein,N is training set sample number, XiRefer to i-th of input, ZiRefer to Data after i-th of decoding and reconstituting.
8. the file classification method as claimed in claim 1 based on deep learning mixed model, it is characterised in that the depth The training process of confidence network is as follows:
It is first according to successively greedy method and carries out pre-training, then network is adjusted using the function of BP error back propagations It is excellent;Wherein, during pre-training, the computational methods of weight are as follows:
To each record X in training set, X is attached to aobvious layer v0, calculate the probability that it makes hidden neuron be activated:Subscript in formula is used to distinguish different vectors, and subscript is used to distinguish same vectorial difference Dimension;Then, a sample h is extracted from the probability distribution calculated(0)~P (h(0)|v(0)), use h(0)The aobvious layer of reconstructEqually extract a sample v of aobvious layer(1)~P (v(1)|h(0)), then with after reconstruct Aobvious layer neuron calculates the probability that hidden neuron is activated,Weight is updated as the following formula:W ←W+λ(P(h(0)=1 | v(0))v(0)T-P(h(1)=1 | v(1))v(1)T)。
9. a kind of document sorting apparatus based on deep learning mixed model, including memory, processor and it is stored in memory Computer program that is upper and can running on a processor, it is characterised in that realized during the computing device described program as weighed Profit requires the text classification based on deep learning mixed model described in any one of 1-8.
10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The text classification based on deep learning mixed model as described in claim any one of 1-8 is performed during execution.
CN201710864498.XA 2017-09-22 2017-09-22 File classification method and device based on deep learning mixed model Pending CN107665248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710864498.XA CN107665248A (en) 2017-09-22 2017-09-22 File classification method and device based on deep learning mixed model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710864498.XA CN107665248A (en) 2017-09-22 2017-09-22 File classification method and device based on deep learning mixed model

Publications (1)

Publication Number Publication Date
CN107665248A true CN107665248A (en) 2018-02-06

Family

ID=61097104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710864498.XA Pending CN107665248A (en) 2017-09-22 2017-09-22 File classification method and device based on deep learning mixed model

Country Status (1)

Country Link
CN (1) CN107665248A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447565A (en) * 2018-03-23 2018-08-24 北京工业大学 A kind of small for gestational age infant disease forecasting method based on improvement noise reduction autocoder
CN108595717A (en) * 2018-05-18 2018-09-28 北京慧闻科技发展有限公司 For the data processing method of text classification, data processing equipment and electronic equipment
CN108763384A (en) * 2018-05-18 2018-11-06 北京慧闻科技发展有限公司 For the data processing method of text classification, data processing equipment and electronic equipment
CN109299246A (en) * 2018-12-04 2019-02-01 北京容联易通信息技术有限公司 A kind of file classification method and device
CN109829054A (en) * 2019-01-17 2019-05-31 齐鲁工业大学 A kind of file classification method and system
CN110750640A (en) * 2019-09-17 2020-02-04 平安科技(深圳)有限公司 Text data classification method and device based on neural network model and storage medium
CN111274406A (en) * 2020-03-02 2020-06-12 湘潭大学 Text classification method based on deep learning hybrid model
CN111309909A (en) * 2020-02-13 2020-06-19 北京工业大学 Text emotion classification method based on hybrid model
CN112735604A (en) * 2021-01-13 2021-04-30 大连海事大学 Novel coronavirus classification method based on deep learning algorithm
CN113239190A (en) * 2021-04-27 2021-08-10 天九共享网络科技集团有限公司 Document classification method and device, storage medium and electronic equipment
CN113449491A (en) * 2021-07-05 2021-09-28 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302884A (en) * 2015-10-19 2016-02-03 天津海量信息技术有限公司 Deep learning-based webpage mode recognition method and visual structure learning method
CN106295245A (en) * 2016-07-27 2017-01-04 广州麦仑信息科技有限公司 The method of storehouse noise reduction own coding gene information feature extraction based on Caffe

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302884A (en) * 2015-10-19 2016-02-03 天津海量信息技术有限公司 Deep learning-based webpage mode recognition method and visual structure learning method
CN106295245A (en) * 2016-07-27 2017-01-04 广州麦仑信息科技有限公司 The method of storehouse noise reduction own coding gene information feature extraction based on Caffe

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周超: "基于深度学习混合模型的文本分类研究", 《万方数据库》 *
胡振 等: "基于深度学习的作曲家分类问题", 《计算机研究与发展》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447565B (en) * 2018-03-23 2021-10-08 北京工业大学 Small gestational age infant prediction method based on improved noise reduction automatic encoder
CN108447565A (en) * 2018-03-23 2018-08-24 北京工业大学 A kind of small for gestational age infant disease forecasting method based on improvement noise reduction autocoder
CN108595717A (en) * 2018-05-18 2018-09-28 北京慧闻科技发展有限公司 For the data processing method of text classification, data processing equipment and electronic equipment
CN108763384A (en) * 2018-05-18 2018-11-06 北京慧闻科技发展有限公司 For the data processing method of text classification, data processing equipment and electronic equipment
CN109299246A (en) * 2018-12-04 2019-02-01 北京容联易通信息技术有限公司 A kind of file classification method and device
CN109829054A (en) * 2019-01-17 2019-05-31 齐鲁工业大学 A kind of file classification method and system
CN110750640A (en) * 2019-09-17 2020-02-04 平安科技(深圳)有限公司 Text data classification method and device based on neural network model and storage medium
CN110750640B (en) * 2019-09-17 2022-11-04 平安科技(深圳)有限公司 Text data classification method and device based on neural network model and storage medium
CN111309909A (en) * 2020-02-13 2020-06-19 北京工业大学 Text emotion classification method based on hybrid model
CN111309909B (en) * 2020-02-13 2021-07-30 北京工业大学 Text emotion classification method based on hybrid model
CN111274406A (en) * 2020-03-02 2020-06-12 湘潭大学 Text classification method based on deep learning hybrid model
CN112735604A (en) * 2021-01-13 2021-04-30 大连海事大学 Novel coronavirus classification method based on deep learning algorithm
CN112735604B (en) * 2021-01-13 2024-03-26 大连海事大学 Novel coronavirus classification method based on deep learning algorithm
CN113239190A (en) * 2021-04-27 2021-08-10 天九共享网络科技集团有限公司 Document classification method and device, storage medium and electronic equipment
CN113239190B (en) * 2021-04-27 2024-02-20 天九共享网络科技集团有限公司 Document classification method, device, storage medium and electronic equipment
CN113449491A (en) * 2021-07-05 2021-09-28 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder
CN113449491B (en) * 2021-07-05 2023-12-26 思必驰科技股份有限公司 Pre-training framework for language understanding and generation with two-stage decoder

Similar Documents

Publication Publication Date Title
CN107665248A (en) File classification method and device based on deep learning mixed model
Onan Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks
Li et al. DeepPatent: patent classification with convolutional neural networks and word embedding
Tsaptsinos Lyrics-based music genre classification using a hierarchical attention network
CN107229610B (en) A kind of analysis method and device of affection data
CN104834747B (en) Short text classification method based on convolutional neural networks
CN110750640B (en) Text data classification method and device based on neural network model and storage medium
CN109558487A (en) Document Classification Method based on the more attention networks of hierarchy
Terechshenko et al. A comparison of methods in political science text classification: Transfer learning language models for politics
CN109189925A (en) Term vector model based on mutual information and based on the file classification method of CNN
CN107832400A (en) A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification
CN106294684A (en) The file classification method of term vector and terminal unit
Zhao et al. The study on the text classification for financial news based on partial information
CN105976056A (en) Information extraction system based on bidirectional RNN
CN108509982A (en) A method of the uneven medical data of two classification of processing
CN108090231A (en) A kind of topic model optimization method based on comentropy
CN105740236A (en) Writing feature and sequence feature combined Chinese sentiment new word recognition method and system
CN106959946A (en) A kind of text semantic feature generation optimization method based on deep learning
CN107688870A (en) A kind of the classification factor visual analysis method and device of the deep neural network based on text flow input
CN108920446A (en) A kind of processing method of Engineering document
CN115186069A (en) CNN-BiGRU-based academic text abstract automatic classification method
CN113204640A (en) Text classification method based on attention mechanism
Zhang et al. Structure learning for headline generation
CN116306785A (en) Student performance prediction method of convolution long-short term network based on attention mechanism
Bai et al. Gated character-aware convolutional neural network for effective automated essay scoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180206