CN105279230A - Method and system for constructing internet application feature identification database with active learning method - Google Patents

Method and system for constructing internet application feature identification database with active learning method Download PDF

Info

Publication number
CN105279230A
CN105279230A CN201510588327.XA CN201510588327A CN105279230A CN 105279230 A CN105279230 A CN 105279230A CN 201510588327 A CN201510588327 A CN 201510588327A CN 105279230 A CN105279230 A CN 105279230A
Authority
CN
China
Prior art keywords
internet
applications
application
feature
simulator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510588327.XA
Other languages
Chinese (zh)
Inventor
谭彦
李元新
龙云亮
邓博存
梁志禧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Shunde Zhongka Cloud Network Technology Co Ltd
SYSU CMU Shunde International Joint Research Institute
Original Assignee
Guangdong Shunde Zhongka Cloud Network Technology Co Ltd
SYSU CMU Shunde International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Shunde Zhongka Cloud Network Technology Co Ltd, SYSU CMU Shunde International Joint Research Institute filed Critical Guangdong Shunde Zhongka Cloud Network Technology Co Ltd
Priority to CN201510588327.XA priority Critical patent/CN105279230A/en
Publication of CN105279230A publication Critical patent/CN105279230A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a system for constructing an internet application feature identification database with an active learning method. The method comprises: deploying a client simulator in the internet; actively initiating an interactive access to an internet application; obtaining a protocol interactive process message when the application runs; extracting a fixed feature position of the message to generate an interactive feature sample for the internet application; performing training and learning on a training sample with a machine learning method to obtain an internet application feature model; processing a subsequent message acquired by the simulator; and writing /updating an application feature entry obtained in real time into a feature database. By means of the method, features of applications in the internet can be dynamically learnt, and the method has the characteristics of flexibility for operation, high extensibility and easiness for deployment; feature data in the application feature database are real-time; and the effect of enabling the database to be universal in the whole internet after one-point construction is achieved.

Description

The method and system in internet, applications biometric data storehouse are built by Active Learning Method
Technical field
The present invention relates to intelligent pipeline technical field, particularly relate to a kind of method and system being built internet, applications biometric data storehouse by Active Learning Method.
Background technology
It is very wide that application characteristic is identified in the application of the field such as intelligent pipeline, network security, generally by calling application characteristic storehouse, the data traffic flowed through is identified, but change greatly due to internet, applications agreement and have considerable application not follow the standard-requireds such as RFC, if when therefore application characteristic storehouse can not upgrade fast and just be directly used in identification application, discrimination is not high.
In prior art, application characteristic storehouse is all provided by manual sorting method, due to the difference of technology, when using deep-packet detection function, all also exist upgrade application characteristic storehouse slow, the problems such as all application traffics can not be detected.
Summary of the invention
The present invention, for overcoming at least one defect (deficiency) described in above-mentioned prior art, first proposes a kind of method being built internet, applications biometric data storehouse by Active Learning Method.The method can realize dynamic study to the feature applied in internet, has flexible operation, and extensibility is good, and dispose the feature of being easy to, the characteristic in application characteristic database has real-time, realizes a bit building, and the whole network is general.
The present invention also proposes a kind of system being built internet, applications biometric data storehouse by Active Learning Method.
To achieve these goals, technical scheme of the present invention is as follows:
Built the method in internet, applications biometric data storehouse by Active Learning Method, comprise the following steps:
1) sample acquisition: deployment simulator on the internet, the software of the internet, applications that analog access is specified installed by simulator, initiatively access is initiated to internet, applications by predefined software, obtain protocol interaction process when application runs, then delivery block is caught by message, obtain the flag bit field that in application operational process, protocol massages is fixing, extract the interaction feature sample of message fixed character position generation to internet, applications;
2) training study: by machine learning method to training sample training study, obtains internet, applications characteristic model;
3) generating feature storehouse: utilize the subsequent packet of characteristic model to simulator collection to process, by the application characteristic entry write/regeneration characteristics database obtained in real time.
The method being built application characteristic identification database by Active Learning Method disclosed by the invention, by the virtual client of simulation, interactive access is initiated to real internet, applications, obtain protocol interaction feature, again by protocol characteristic extraction unit, protocol interaction feature is converted into application characteristic record, and writes application characteristic identification database.The present invention can realize dynamic study to the feature of internet, applications, flexible operation, and extensibility is good, and dispose easily, the characteristic in application characteristic database has real-time, realizes a bit building, and the whole network is general.
Preferably, the application characteristic obtained in described step 1) comprises IP address, URL, port and element.
Built the system in internet, applications biometric data storehouse by Active Learning Method, it is characterized in that, comprise simulator initiatively addressed location, sample generation unit, machine learning unit and stream processing unit;
Described simulator is addressed location initiatively: be integrated with Internet protocol access tool in virtual machine, the access behavior of analog subscriber conducts interviews to internet, applications website, and processes the result that internet site returns;
Described sample generation unit: for monitoring initiatively mutual between addressed location and the internet, applications data message of virtual machine, according to predefined characteristic extracting rule, extract the key feature information in flag bit field fixing in data message, generate the interaction feature sample to internet, applications;
Described machine learning unit, by machine learning method to training sample training study, obtains internet, applications characteristic model.
Described stream processing unit: utilize the subsequent packet of characteristic model to simulator collection to process, by the application characteristic entry write/regeneration characteristics database obtained in real time.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The present invention propose collection apparatus be adopt dispose on the internet simulator realize, have nothing to do with existing collection terminal, the property data base of generation directly can be replaced existing feature database or insert existing feature database for identification equipment.The present invention also can be used as third party database be supplied to operator net in application identification equipment use, operator can be helped to distinguish the situation such as situation, bandwidth occupancy of all kinds of internet, applications distributions of transmission in transmission pipeline, thus help operator to formulate more reasonably flow package; Personalized value-added service can be provided for client, help the utilization power of its purchase bandwidth of customer analysis, help the investment of client's minimizing in flow analysis, network management and network security.
Accompanying drawing explanation
Fig. 1 adopts the inventive method to realize the schematic diagram that application characteristic storehouse builds automatically.
Fig. 2 is the structural representation of present system.
Embodiment
Accompanying drawing, only for exemplary illustration, can not be interpreted as the restriction to this patent; In order to better the present embodiment is described, some parts of accompanying drawing have omission, zoom in or out, and do not represent the size of actual product;
To those skilled in the art, in accompanying drawing, some known features and explanation thereof may be omitted is understandable.Below in conjunction with drawings and Examples, technical scheme of the present invention is described further.
As Fig. 1, a kind of method being built internet, applications biometric data storehouse by Active Learning Method, is comprised the following steps:
1) sample acquisition: deployment simulator on the internet, the software of the internet, applications that analog access is specified installed by simulator, initiatively access is initiated to internet, applications by predefined software, obtain protocol interaction process when application runs, then delivery block is caught by message, obtain the flag bit field that in application operational process, protocol massages is fixing, extract the interaction feature sample of message fixed character position generation to internet, applications;
2) training study: by machine learning method to training sample training study, obtains internet, applications characteristic model;
3) generating feature storehouse: utilize the subsequent packet of characteristic model to simulator collection to process, by the application characteristic entry write/regeneration characteristics database obtained in real time.
The method being built application characteristic identification database by Active Learning Method disclosed by the invention, by the virtual client of simulation, interactive access is initiated to real internet, applications, obtain protocol interaction feature, again by protocol characteristic extraction unit, protocol interaction feature is converted into application characteristic record, and writes application characteristic identification database.The present invention can realize dynamic study to the feature of internet, applications, flexible operation, and extensibility is good, and dispose easily, the characteristic in application characteristic database has real-time, realizes a bit building, and the whole network is general.
As Fig. 2, a kind of system being built internet, applications biometric data storehouse by Active Learning Method, be is characterized in that, comprises simulator initiatively addressed location, sample generation unit, machine learning unit and stream processing unit;
Described simulator is addressed location initiatively: be integrated with Internet protocol access tool in virtual machine, the access behavior of analog subscriber conducts interviews to internet, applications website, and processes the result that internet site returns;
Described sample generation unit: for monitoring initiatively mutual between addressed location and the internet, applications data message of virtual machine, according to predefined characteristic extracting rule, extract the key feature information in flag bit field fixing in data message, generate the interaction feature sample to internet, applications;
Described machine learning unit, by machine learning method to training sample training study, obtains internet, applications characteristic model.
Described stream processing unit: utilize the subsequent packet of characteristic model to simulator collection to process, by the application characteristic entry write/regeneration characteristics database obtained in real time.
The present invention propose collection apparatus be adopt dispose on the internet virtual machine realize, have nothing to do with existing collection terminal, the property data base of generation directly can be replaced existing feature database or insert existing feature database for identification equipment.
At present because DPI technology uses in a large number on network, more and more higher to application identification accuracy requirement, the DPI equipment of current employing feature database coupling occupies great majority, most equipment needs regeneration characteristics storehouse to ensure that equipment possesses high identification accuracy, otherwise, discrimination can be caused to decline because of the change of internet, applications agreement, therefore, just there is the demand using third party's feature database to upgrade apparatus characteristic storehouse in operator.
The present invention propose a kind of can the method in automatic learning generating feature storehouse, pass through the method, up-to-date feature database entry can be constructed, by artificial or automatic mode, online updating is carried out to the existing feature database of DPI equipment, makes the equipment that have employed DPI technology, the consistance of the whole network recognition capability can be kept, and the feature that mobile Internet upgrades fast can be caught up with, make to identify that accuracy is kept even improving.
The present invention also can be used as third party database be supplied to operator net in application identification equipment use, operator can be helped to distinguish the situation such as situation, bandwidth occupancy of all kinds of internet, applications distributions of transmission in transmission pipeline, thus help operator to formulate more reasonably flow package; Personalized value-added service can be provided for client, help the utilization power of its purchase bandwidth of customer analysis, help the investment of client's minimizing in flow analysis, network management and network security.
Obviously, the above embodiment of the present invention is only for example of the present invention is clearly described, and is not the restriction to embodiments of the present invention.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here exhaustive without the need to also giving all embodiments.All any amendments done within the spirit and principles in the present invention, equivalent to replace and improvement etc., within the protection domain that all should be included in the claims in the present invention.

Claims (3)

1. built a method for application characteristic identification database by Active Learning Method, it is characterized in that, comprise the following steps:
1) sample acquisition: deployment simulator on the internet, the software of the internet, applications that analog access is specified installed by simulator, initiatively access is initiated to internet, applications by predefined software, obtain protocol interaction process when application runs, then delivery block is caught by message, obtain the flag bit field that in application operational process, protocol massages is fixing, extract the interaction feature sample of message fixed character position generation to internet, applications;
2) training study: by machine learning method to training sample training study, obtains internet, applications characteristic model;
3) generating feature storehouse: utilize the subsequent packet of characteristic model to simulator collection to process, by the application characteristic entry write/regeneration characteristics database obtained in real time.
2. method according to claim 1, is characterized in that, the application characteristic obtained in described step 1) comprises IP address, URL, port and element.
3. built the system in internet, applications biometric data storehouse by Active Learning Method, it is characterized in that, comprise simulator initiatively addressed location, sample generation unit, machine learning unit and stream processing unit;
Described simulator is addressed location initiatively: be integrated with Internet protocol access tool in virtual machine, the access behavior of analog subscriber conducts interviews to internet, applications website, and processes the result that internet site returns;
Described sample generation unit: for monitoring initiatively mutual between addressed location and the internet, applications data message of virtual machine, according to predefined characteristic extracting rule, extract the key feature information in flag bit field fixing in data message, generate the interaction feature sample to internet, applications;
Described machine learning unit, by machine learning method to training sample training study, obtains internet, applications characteristic model;
Described stream processing unit: utilize the subsequent packet of characteristic model to simulator collection to process, by the application characteristic entry write/regeneration characteristics database obtained in real time.
CN201510588327.XA 2015-09-16 2015-09-16 Method and system for constructing internet application feature identification database with active learning method Pending CN105279230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510588327.XA CN105279230A (en) 2015-09-16 2015-09-16 Method and system for constructing internet application feature identification database with active learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510588327.XA CN105279230A (en) 2015-09-16 2015-09-16 Method and system for constructing internet application feature identification database with active learning method

Publications (1)

Publication Number Publication Date
CN105279230A true CN105279230A (en) 2016-01-27

Family

ID=55148244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510588327.XA Pending CN105279230A (en) 2015-09-16 2015-09-16 Method and system for constructing internet application feature identification database with active learning method

Country Status (1)

Country Link
CN (1) CN105279230A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850349A (en) * 2017-02-08 2017-06-13 杭州迪普科技股份有限公司 The extracting method and device of a kind of characteristic information
CN109857726A (en) * 2019-02-27 2019-06-07 深信服科技股份有限公司 A kind of application feature database maintaining method, device, electronic equipment and storage medium
CN111158704A (en) * 2020-01-02 2020-05-15 中国银行股份有限公司 Model establishing method, deployment flow generation method, device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102420701A (en) * 2011-11-28 2012-04-18 北京邮电大学 Method for extracting internet service flow characteristics
CN102938764A (en) * 2012-11-09 2013-02-20 北京神州绿盟信息安全科技股份有限公司 Application identification processing method and device
CN102984243A (en) * 2012-11-20 2013-03-20 杭州迪普科技有限公司 Automatic identification method and device applied to secure socket layer (SSL)
US20130097308A1 (en) * 2011-04-05 2013-04-18 Ss8 Networks, Inc. Collecting asymmetric data and proxy data on a communication network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097308A1 (en) * 2011-04-05 2013-04-18 Ss8 Networks, Inc. Collecting asymmetric data and proxy data on a communication network
CN102420701A (en) * 2011-11-28 2012-04-18 北京邮电大学 Method for extracting internet service flow characteristics
CN102938764A (en) * 2012-11-09 2013-02-20 北京神州绿盟信息安全科技股份有限公司 Application identification processing method and device
CN102984243A (en) * 2012-11-20 2013-03-20 杭州迪普科技有限公司 Automatic identification method and device applied to secure socket layer (SSL)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106850349A (en) * 2017-02-08 2017-06-13 杭州迪普科技股份有限公司 The extracting method and device of a kind of characteristic information
CN106850349B (en) * 2017-02-08 2020-01-03 杭州迪普科技股份有限公司 Feature information extraction method and device
CN109857726A (en) * 2019-02-27 2019-06-07 深信服科技股份有限公司 A kind of application feature database maintaining method, device, electronic equipment and storage medium
CN109857726B (en) * 2019-02-27 2023-05-12 深信服科技股份有限公司 Application feature library maintenance method and device, electronic equipment and storage medium
CN111158704A (en) * 2020-01-02 2020-05-15 中国银行股份有限公司 Model establishing method, deployment flow generation method, device and electronic equipment
CN111158704B (en) * 2020-01-02 2023-08-22 中国银行股份有限公司 Model building method, deployment flow generating method, device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109639481B (en) Deep learning-based network traffic classification method and system and electronic equipment
CN109033471B (en) Information asset identification method and device
Wang et al. A smart home gateway platform for data collection and awareness
JP2019513246A (en) Training method of random forest model, electronic device and storage medium
CN104951544A (en) User data processing method and system and method and system for providing user data
CN104506484A (en) Proprietary protocol analysis and identification method
CN106789242B (en) Intelligent identification application analysis method based on mobile phone client software dynamic feature library
CN104168316B (en) A kind of Webpage access control method, gateway
CN109842588B (en) Network data detection method and related equipment
CN107360145A (en) A kind of multinode honey pot system and its data analysing method
CN103618792B (en) Data stream identification method and device
CN113825129B (en) Industrial Internet asset mapping method in 5G network environment
CN112887329B (en) Hidden service tracing method and device and electronic equipment
CN105279230A (en) Method and system for constructing internet application feature identification database with active learning method
CN110020161B (en) Data processing method, log processing method and terminal
CN107070700B (en) Network service providing method based on automatic identification of identity
WO2017054307A1 (en) Recognition method and apparatus for user information
CN109698798B (en) Application identification method and device, server and storage medium
CN104184723A (en) Application identifying method and device and network equipment
CN104980409A (en) Internet behavior management method and device
CN111242509B (en) Service management system and service management method for intelligent community
CN110708341B (en) User behavior detection method and system based on remote desktop encryption network traffic mode difference
CN105100246A (en) Network flow management and control method based on downloaded resource name
CN106304084B (en) Information processing method and device
CN103152215A (en) Testing system and testing method of data center

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160127

RJ01 Rejection of invention patent application after publication