CN103473483A - Online predicting method for structure and function of protein - Google Patents

Online predicting method for structure and function of protein Download PDF

Info

Publication number
CN103473483A
CN103473483A CN2013104590906A CN201310459090A CN103473483A CN 103473483 A CN103473483 A CN 103473483A CN 2013104590906 A CN2013104590906 A CN 2013104590906A CN 201310459090 A CN201310459090 A CN 201310459090A CN 103473483 A CN103473483 A CN 103473483A
Authority
CN
China
Prior art keywords
protein
data
function
support vector
protein sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013104590906A
Other languages
Chinese (zh)
Inventor
谢华林
黄建华
符靓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2013104590906A priority Critical patent/CN103473483A/en
Publication of CN103473483A publication Critical patent/CN103473483A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an online predicting method for the structure and function of a protein based on a wavelet transformation support vector machine. The method comprises the following steps of 1, establishing a training sample set of a protein sequence dataset; 2, converting the protein amino acid sequences into number sequences by utilizing physical and chemical properties of amino acids; 3, performing feature extraction by utilizing wavelet transformation; 4, training the generated protein feature dataset by the support vector machine; 5, performing reading and data conversion on protein sequences needing to be predicted, and the online predicting on the structure and function of the protein. The method can realize the prediction on the variety and function of an unknown protein, verification results show that the method has better predicting accurate rate on a G protein-coupled receptor, a zymoprotein, a protein subcellular structure and a protein secondary structure; during online predicting, a user only needs to provide the protein sequence to be predicted on a prediction webpage interface, after being subjected to conversion, the data of the protein sequence is subjected to feature extraction by utilizing wavelet transformation, the training by the support vector machine and the predicting on the target are finished and the predicting result is outputted.

Description

A kind of on-line prediction method of protein structure and function
Technical field
The present invention relates to the implementation method of the classification prediction online forecasting of a kind of family based on wavelet support vector machines and function.Belong to field of bioinformatics.
Background technology
The objective of the invention is to overcome deficiency of the prior art, a kind of protein structure and function classification prediction online forecasting method based on wavelet support vector machines is provided.The method is utilized the outstanding advantages of support vector machine sorting technique in the Feature Mapping method, realizes the classification prediction online forecasting of protein families and function, has improved predictablity rate, for laboratory staff provides useful reference.
Summary of the invention
In order to achieve the above object, the technical solution of this fermentation is as follows: above-mentionedly based on support vector machine, protein families and Function Classification prediction online forecasting method is comprised the steps:
(1) set up the training sample of protein sequence data collection: from internet, Protein Data Bank SWISS-PROT gathers the training sample that builds the protein sequence data collection, the training sample of the training set of this protein sequence data collection comprises g protein coupled receptor, zymoprotein, Protein Subcellular structure, secondary protein structure, and can add as required or new data set more, above-mentioned data set comprises respectively two classes: a class is divided into positive sample, the another kind of negative sample that is divided into;
(2) protein sequence data collection conversion: the protein sequence data collection that above-mentioned steps (1) is obtained converts to and can be used for the sequence of values that signal is processed, and each protein sequence of protein sequence data being concentrated utilizes its amino acid physicochemical property to convert sequence of values to;
(3) utilize wavelet transformation technique to carry out feature extraction: the sequence of values that above-mentioned steps (2) is obtained is carried out wavelet decomposition and is obtained the feature wavelet coefficient, and extracts proper vector from these coefficients;
(4) with support vector machine learning training protein sequence data collection: by the essence of support vector machine (SVM) learning training, be that the protein characteristic data set that utilizes support vector fleet above-mentioned steps (3) to generate is trained, the protein families classification forecast model of supported vector machine;
(5) need reading in of forecast protein sequence, the prediction of data-switching and protein families thereof and function: after utilizing the Servlet assembly of writing by the J2EE standard to read in Web client submission protein sequence data, the Servlet assembly first calls Verification Components user submit data is tested, determine whether valid data, if invalid data is informed possible cause, if valid data, call the precursor assembly and make it complete initialization, the calling data converter assembly is converted to sequence of values to protein sequence again, then utilize wavelet transformation to carry out feature extraction, finally being input to the precursor assembly is forecast.
According to claim 1ly based on wavelet support vector machines, protein families and function are carried out to the online forecasting method, it is characterized in that, above-mentioned steps (5) needs the category forecast of the reading in of forecast protein sequence, data-switching and protein families and function, and its concrete steps are as follows:
(5-1) write the Servlet assembly by the J2EE standard, from the Web client, read in the forecast protein sequence also online to its data-switching;
(5-2) user to the prediction of classifying of protein families and function, and then carries out category forecast to it by wavelet support vector machines precursor assembly;
(5-3) call top Servlet assembly, the protein families that step (5-2) is obtained and Function Classification type forecast result output to the online page of Web client and show.
The present invention is based on the protein families of support vector machine and function online forecasting method compared with prior art, have following outstanding substantive distinguishing features and remarkable advantage: 1. accuracy rate is high.The method can accurately realize agnoprotein matter family and function are predicted.2. call time in advance short.Mention owing to using wavelet transformation to carry out feature, effectively reduce the proper vector dimension, make arithmetic speed faster.3. cost is low.This invention needs to utilize existing known protein to set up model as training set, and these can be by some free Protein Data Banks acquisitions in the world.4. convenient and swift.During online forecasting, the user only need provide satisfactory data at the forecast web interface.By the conversion to data, complete training and the target type forecast of support vector machine, obtain forecast result.
The accompanying drawing explanation:
Fig. 1 the present invention is based on the protein families of support vector machine and the process flow diagram of function online forecasting method.Embodiment, be described in further details the present invention below in conjunction with accompanying drawing
(1) set up the training sample of protein sequence data collection: from internet, Protein Data Bank SWISS-PROT gathers the training sample that builds the protein sequence data collection, the training sample of the training set of this protein sequence data collection comprises g protein coupled receptor, zymoprotein, Protein Subcellular structure, secondary protein structure, and can add as required or new data set more, above-mentioned data set comprises respectively two classes: a class is divided into positive sample, the another kind of negative sample that is divided into;
(2) protein sequence data collection conversion: the protein sequence data collection that above-mentioned steps (1) is obtained converts to and can be used for the sequence of values that signal is processed, and each protein sequence of protein sequence data being concentrated utilizes its amino acid physicochemical property to convert sequence of values to;
(3) utilize wavelet transformation technique to carry out feature extraction: the sequence of values that above-mentioned steps (2) is obtained is carried out wavelet decomposition and is obtained the feature wavelet coefficient, and extracts proper vector from these coefficients;
(4) with support vector machine learning training protein sequence data collection: be that the protein characteristic data set that utilizes support vector fleet above-mentioned steps (3) to generate is trained by the essence of support vector machine (SVM) learning training, the protein families classification forecast model of supported vector machine, utilize this model can reappear the input/output relation of training data.
Modeling method support vector machine of the present invention is the SVM algorithm of realizing with the Python programming language.The present invention adopts three kinds of kernel functions to be tested:
(1) linear kernel function:
(2) radial basis kernel function:
(3) polynomial kernel function:
Figure 125651DEST_PATH_IMAGE003
(5) need the prediction of the reading in of forecast protein sequence, data-switching and protein families and function.Its concrete steps are as follows:
(5-1) write the Servlet assembly by the J2EE standard, from the Web client, read in the forecast protein sequence also online to its data-switching;
(5-2) user to the prediction of classifying of protein families and function, and then carries out category forecast to it by wavelet support vector machines precursor assembly;
(5-3) call top Servlet assembly, the protein families that step (5-2) is obtained and Function Classification type forecast result output to the online page of Web client and show.

Claims (3)

1. one kind acts on protein structure and function online based on wavelet transformation and support vector machine, it is characterized in that the method comprises the steps:
(1) set up the training sample of protein sequence data collection: from internet, Protein Data Bank SWISS-PROT gathers the training sample that builds the protein sequence data collection, the training sample of the training set of this protein sequence data collection comprises g protein coupled receptor, zymoprotein, Protein Subcellular structure, secondary protein structure, and can add as required or new data set more, above-mentioned data set comprises respectively two classes: a class is divided into positive sample, the another kind of negative sample that is divided into;
(2) protein sequence data collection conversion: the protein sequence data collection that above-mentioned steps (1) is obtained converts to and can be used for the sequence of values that signal is processed, and each protein sequence of protein sequence data being concentrated utilizes its amino acid physicochemical property to convert sequence of values to;
(3) utilize wavelet transformation technique to carry out feature extraction: the sequence of values that above-mentioned steps (2) is obtained is carried out wavelet decomposition and is obtained the feature wavelet coefficient, and extracts proper vector from these coefficients;
(4) with support vector machine learning training protein sequence data collection: by the essence of support vector machine (SVM) learning training, be that the protein characteristic data set that utilizes support vector fleet above-mentioned steps (3) to generate is trained, the protein families classification forecast model of supported vector machine;
(5) need reading in of forecast protein sequence, the prediction of data-switching and protein families thereof and function: after utilizing the Servlet assembly of writing by the J2EE standard to read in Web client submission protein sequence data, the Servlet assembly first calls Verification Components user submit data is tested, determine whether valid data, if invalid data is informed possible cause, if valid data, call the precursor assembly and make it complete initialization, the calling data converter assembly is converted to sequence of values to protein sequence again, then utilize wavelet transformation to carry out feature extraction, finally being input to the precursor assembly is forecast.
2. according to claim 1ly based on wavelet support vector machines, protein families and function are carried out to the online forecasting method, it is characterized in that, above-mentioned steps (5) needs the category forecast of the reading in of forecast protein sequence, data-switching and protein families and function, and its concrete steps are as follows:
(5-1) write the Servlet assembly by the J2EE standard, from the Web client, read in the forecast protein sequence also online to its data-switching;
(5-2) user to the prediction of classifying of protein families and function, and then carries out category forecast to it by wavelet support vector machines precursor assembly;
(5-3) call top Servlet assembly, the protein families that step (4-2) is obtained and Function Classification type forecast result output to the online page of Web client and show.
According to claim 2 based on wavelet support vector machines to protein families and function online forecasting method, it is characterized in that, above-mentioned steps (5-2) user carries out category forecast by wavelet support vector machines precursor assembly to protein families and function, its concrete steps: carry out protein classification and give the correct time in advance, the precursor assembly reads the protein classification model from magnetic disk media, be written into model and complete initialization, read in the data by data converter output, the classification forecast model be written into is processed these data, obtain the family classification information of this protein, Output rusults.
CN2013104590906A 2013-10-07 2013-10-07 Online predicting method for structure and function of protein Pending CN103473483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013104590906A CN103473483A (en) 2013-10-07 2013-10-07 Online predicting method for structure and function of protein

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013104590906A CN103473483A (en) 2013-10-07 2013-10-07 Online predicting method for structure and function of protein

Publications (1)

Publication Number Publication Date
CN103473483A true CN103473483A (en) 2013-12-25

Family

ID=49798330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013104590906A Pending CN103473483A (en) 2013-10-07 2013-10-07 Online predicting method for structure and function of protein

Country Status (1)

Country Link
CN (1) CN103473483A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529206A (en) * 2016-12-20 2017-03-22 大连海事大学 Automatic wiring method of protein two-dimensional structure diagram function element
CN106599611A (en) * 2016-12-09 2017-04-26 中南大学 Marking method and system for protein functions
CN107423577A (en) * 2017-04-20 2017-12-01 北京工业大学 A kind of On Protein Fold Type Recognition based on amino acid sequence
CN107563150A (en) * 2017-08-31 2018-01-09 深圳大学 Forecasting Methodology, device, equipment and the storage medium of protein binding site
CN107924429A (en) * 2015-04-14 2018-04-17 皮阿赛勒公司 Method and electronic system, related computer program product at least one fitness value for predicting protein
CN109147868A (en) * 2018-07-18 2019-01-04 深圳大学 Protein function prediction technique, device, equipment and storage medium
CN109817275A (en) * 2018-12-26 2019-05-28 东软集团股份有限公司 The generation of protein function prediction model, protein function prediction technique and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003063001A1 (en) * 2002-01-18 2003-07-31 Bea Systems, Inc. System and method for http request preprocessing for servlets and application servers
US20040128328A1 (en) * 2002-12-31 2004-07-01 International Business Machines Corporation Method and apparatus for relaxed transactional isolation in a client-server caching architecture
CN1560741A (en) * 2004-02-23 2005-01-05 史宇清 Structure method of five-hierarchical system structure base on J2EE
CN101630346A (en) * 2009-06-26 2010-01-20 上海大学 Method based on support vector machine for on-line prediction of interaction of protein and nucleic acid
CN101727539A (en) * 2009-11-26 2010-06-09 上海大学 On-line forecasting method of enzyme and substrate interaction classification based on nearest neighbor algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003063001A1 (en) * 2002-01-18 2003-07-31 Bea Systems, Inc. System and method for http request preprocessing for servlets and application servers
US20040128328A1 (en) * 2002-12-31 2004-07-01 International Business Machines Corporation Method and apparatus for relaxed transactional isolation in a client-server caching architecture
CN1560741A (en) * 2004-02-23 2005-01-05 史宇清 Structure method of five-hierarchical system structure base on J2EE
CN101630346A (en) * 2009-06-26 2010-01-20 上海大学 Method based on support vector machine for on-line prediction of interaction of protein and nucleic acid
CN101727539A (en) * 2009-11-26 2010-06-09 上海大学 On-line forecasting method of enzyme and substrate interaction classification based on nearest neighbor algorithm

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHANG C C,ET AL.,: "LIBSVM: A library for support vector machines", 《ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (TIST)》, vol. 2, no. 3, 31 December 2011 (2011-12-31), pages 1 - 39 *
QIU J D, ET AL.,: "Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform", 《ANALYTICAL BIOCHEMISTRY》, vol. 390, no. 1, 11 April 2009 (2009-04-11), pages 68 - 73, XP026130726, DOI: doi:10.1016/j.ab.2009.04.009 *
QIU J D,ET AL.,: "Predicting subcellular location of apoptosis proteins based on wavelet transform and support vector machine", 《 AMINO ACIDS》, vol. 38, no. 4, 4 April 2009 (2009-04-04), pages 1201 - 1208, XP019805419 *
QIU J D,ET AL.,: "Using support vector machines for prediction of protein structural classes based on discrete wavelet transform", 《JOURNAL OF COMPUTATIONAL CHEMISTRY》, vol. 30, no. 8, 13 November 2008 (2008-11-13), pages 1344 - 1358 *
YU X,ET AL.,: "Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation", 《AMINO ACIDS》, vol. 42, no. 5, 23 February 2011 (2011-02-23), pages 1619 - 1625, XP035042775, DOI: doi:10.1007/s00726-011-0848-8 *
罗三华: "小波支持向量机在蛋白质结构功能预测中的应用", 《中国优秀硕士学位论文全文数据库 基础科学辑》, no. 5, 15 May 2010 (2010-05-15), pages 006 - 20 *
黄建华: "蛋白质分类预测中的新方法研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》, no. 4, 15 April 2011 (2011-04-15), pages 006 - 58 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924429A (en) * 2015-04-14 2018-04-17 皮阿赛勒公司 Method and electronic system, related computer program product at least one fitness value for predicting protein
CN106599611A (en) * 2016-12-09 2017-04-26 中南大学 Marking method and system for protein functions
CN106599611B (en) * 2016-12-09 2019-04-30 中南大学 Protein function mask method and system
CN106529206A (en) * 2016-12-20 2017-03-22 大连海事大学 Automatic wiring method of protein two-dimensional structure diagram function element
CN106529206B (en) * 2016-12-20 2019-02-22 大连海事大学 A kind of automatic wiring method of protein two-dimensional structure figure function element
CN107423577A (en) * 2017-04-20 2017-12-01 北京工业大学 A kind of On Protein Fold Type Recognition based on amino acid sequence
CN107423577B (en) * 2017-04-20 2020-09-25 北京工业大学 Protein folding type identification method based on amino acid sequence
CN107563150A (en) * 2017-08-31 2018-01-09 深圳大学 Forecasting Methodology, device, equipment and the storage medium of protein binding site
CN107563150B (en) * 2017-08-31 2021-03-19 深圳大学 Method, device, equipment and storage medium for predicting protein binding site
CN109147868A (en) * 2018-07-18 2019-01-04 深圳大学 Protein function prediction technique, device, equipment and storage medium
CN109817275A (en) * 2018-12-26 2019-05-28 东软集团股份有限公司 The generation of protein function prediction model, protein function prediction technique and device
CN109817275B (en) * 2018-12-26 2020-12-01 东软集团股份有限公司 Protein function prediction model generation method, protein function prediction device, and computer readable medium

Similar Documents

Publication Publication Date Title
CN107291822B (en) Problem classification model training method, classification method and device based on deep learning
CN103473483A (en) Online predicting method for structure and function of protein
Gong et al. Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation
Li et al. Fault diagnosis of transformer windings based on decision tree and fully connected neural network
CN110489424A (en) A kind of method, apparatus, storage medium and the electronic equipment of tabular information extraction
CN111582315A (en) Sample data processing method and device and electronic equipment
CN117217807B (en) Bad asset estimation method based on multi-mode high-dimensional characteristics
CN112749277B (en) Medical data processing method, device and storage medium
CN110073374A (en) Model learning device and model learning method
Qi et al. CISO: Co-iteration semi-supervised learning for visual object detection
CN111863135B (en) False positive structure variation filtering method, storage medium and computing device
CN109408175A (en) Real-time interaction method and system in general high-performance deep learning computing engines
CN110889290B (en) Text encoding method and apparatus, text encoding validity checking method and apparatus
CN101630346A (en) Method based on support vector machine for on-line prediction of interaction of protein and nucleic acid
CN117315686A (en) Oracle auxiliary decoding classification method and system based on classification model
CN109326324B (en) Antigen epitope detection method, system and terminal equipment
Wang et al. Pepe: Plain efficient pretrained embeddings for sound event detection
CN116429912A (en) Convolution self-coding-based ultrasonic lamb wave frequency dispersion compensation method and device
CN115953394A (en) Target segmentation-based detection method and system for mesoscale ocean vortexes
Hu et al. A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition
CN115049546A (en) Sample data processing method and device, electronic equipment and storage medium
CN110852102B (en) Chinese part-of-speech tagging method and device, storage medium and electronic equipment
Yuan et al. A decoupled yolov5 with deformable convolution and multi-scale attention
CN117152669B (en) Cross-mode time domain video positioning method and system
Liu et al. Multi-task feature-aligned head in one-stage object detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131225