CN109522185A - A kind of method that model segmentation improves arithmetic speed - Google Patents

A kind of method that model segmentation improves arithmetic speed Download PDF

Info

Publication number
CN109522185A
CN109522185A CN201811375250.8A CN201811375250A CN109522185A CN 109522185 A CN109522185 A CN 109522185A CN 201811375250 A CN201811375250 A CN 201811375250A CN 109522185 A CN109522185 A CN 109522185A
Authority
CN
China
Prior art keywords
cpu
gpu
model
arithmetic speed
inspection software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811375250.8A
Other languages
Chinese (zh)
Inventor
***
邹捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Radium Intelligent Technology Co Ltd
Original Assignee
Jiangsu Radium Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Radium Intelligent Technology Co Ltd filed Critical Jiangsu Radium Intelligent Technology Co Ltd
Priority to CN201811375250.8A priority Critical patent/CN109522185A/en
Publication of CN109522185A publication Critical patent/CN109522185A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of models to divide the method for improving arithmetic speed, comprising the following steps: step 1: test board is connected respectively at camera and input-output equipment;Step 2: mode test board enters system, opens inspection software and initializes;Step 3: it opens camera and brings into operation;Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;Step 5: pass through analysis, choose suitable node, the entire operation of model is cut open to come, first half handles operation using GPU, and latter half carries out operation using CPU, first half is handled operation using GPU by the inventive method, the characteristics of latter half carries out operation using CPU, can be according to model calculation combines the characteristics of hardware, the cut-point manually to design a model, the performance of CPU and GPU are made full use of, to increase substantially the speed of operation.

Description

A kind of method that model segmentation improves arithmetic speed
Technical field
The present invention relates to deep learning arithmetic speed technical field, specially a kind of model segmentation improves the side of arithmetic speed Method.
Background technique
The concept of deep learning is derived from the research of artificial neural network, and the multilayer perceptron containing more hidden layers is exactly a kind of depth Learning structure, deep learning, which forms more abstract high level by combination low-level feature, indicates attribute classification or feature, with discovery The distributed nature of data indicates.
The concept of deep learning was proposed by Hinton et al. in 2006.Non- prison is proposed based on depth confidence network (DBN) The layer-by-layer training algorithm of greed is superintended and directed, hope is brought to solve the relevant optimization problem of deep structure, then proposes multilayer autocoding Device deep structure, the convolutional neural networks that furthermore LeCun et al. is proposed are first real multilayered structure learning algorithms, it is utilized Spatial correlation reduces number of parameters to improve training performance.
Deep learning is a kind of based on the method for carrying out representative learning to data, an observation (such as width in machine learning Image) various ways can be used to indicate, such as vector of each pixel intensity value, or be more abstractively expressed as a series of Side, region of specific shape etc..And use certain specific representation methods be easier from example learning tasks (for example, face Identification or human facial expression recognition), the benefit of deep learning is feature learning and the layered characteristic with non-supervisory formula or Semi-supervised It extracts highly effective algorithm and obtains feature by hand to substitute.
Deep learning is a new field in machine learning research, and motivation is that foundation, simulation human brain are divided The neural network of study is analysed, it imitates the mechanism of human brain to explain data, such as image, sound and text.
In the operation of deep learning, we are substantially all is handled using GPU, and CPU also will use, but the effect used Rate is not high, in fact, the theory of their design of CPU and GPU is different.What GPU was good at is graphics class or is non-figure The highly-parallel numerical value of shape class calculates, and GPU can accommodate the numerical value computational threads of thousands of a not logical relations, its advantage is Parallel computation without logical relation data.What CPU was good at is as operating system, system software and general purpose application program is this kind of possesses The program task of complicated order scheduling, circulation, branch, logic judgment and execution etc., its parallel advantage are program execution levels Face, the complexity of programmed logic also defines the instruction-parallelism that program executes, the thread base that a concurrent programs up to a hundred execute Originally it can't see.
Because neural network mainly carries out matrix operation, a large amount of matrix includes matrix, and the characteristics of matrix operation is letter Substance is multiple, and just as common addition and subtraction, but this huge part of calculative matrix amount is handled using GPU, is natural , but for the logical operation part of deep learning, still GPU is taken to handle, and it is just improper, it would therefore be highly desirable to a kind of improvement Technology solve the problems, such as this in the presence of the prior art.
Summary of the invention
The purpose of the present invention is to provide a kind of models to divide the method for improving arithmetic speed, finds after study, although The frame of present many deep learnings automatically can carry out processing operation by distribution GPU and CPU, still during operation Obviously undesirable, there is no adequately mutual advantage and disadvantage being combined to carry out operation distribution, in the entire of deep learning reasoning operation In process, what is be substantially carried out early period is the highly-parallel numerical value calculating of graphics class either non-graphic class, and the later period is substantially carried out Logical operation processing, therefore, by the entire operation of model cut open come, first half using GPU handle operation, it is latter half of Point operation is carried out using CPU, the characteristics of in this way can be according to model calculation the characteristics of combines hardware, the maximum speed for improving operation Degree applies the speed run in localization so as to improve deep learning, to solve the problems mentioned in the above background technology.
To achieve the above object, the invention provides the following technical scheme: a kind of model divides the method for improving arithmetic speed, The following steps are included:
Step 1: test board is connected respectively at camera and input-output equipment;
Step 2: mode test board enters system, opens inspection software and initializes;
Step 3: it opens camera and brings into operation;
Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;
Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU Operation is managed, latter half carries out operation using CPU.
Preferably, test board includes image procossing evaluation board in the step 1.
Preferably, input-output equipment is keyboard in the step 1, mouse, display, USB set is grown up to be a useful person, exchange is adapted to Device, video output cable are one such or a variety of.
Preferably, inspection software can be one such for chrome browser, CPU-Z and GPU-Z in the step 2 Or it is a variety of.
Preferably, CPU includes that arithmetic unit and cache memory and realizing contacts between them in the step 4 The bus of data, control and state.
Preferably, the whole duration that model runs in the step 5 for inspection software detection device.
Compared with prior art, the beneficial effects of the present invention are:
The entire operation of model to be cut open to come, first half handles operation using GPU, and latter half carries out operation using CPU, In this way can be according to model calculation the characteristics of combine hardware the characteristics of, the cut-point manually to design a model, make full use of CPU and The performance of GPU, the maximum speed for improving operation apply the speed run in localization so as to improve deep learning.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention
Fig. 2 is the time diagram of cpu node and corresponding consuming.
Fig. 3 is the time diagram of GPU node and corresponding consuming.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, the present invention provides a kind of technical solution: a kind of method that model segmentation improves arithmetic speed, including Following steps:
Step 1: test board is connected respectively at camera and input-output equipment;
Step 2: mode test board enters system, opens inspection software and initializes;
Step 3: it opens camera and brings into operation;
Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;
Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU Operation is managed, latter half carries out operation using CPU.
Test board includes image procossing evaluation board in step 1;
In step 1 input-output equipment be keyboard, mouse, display, USB set grow up to be a useful person, AC adapter, video output cable its One of or it is a variety of;
Inspection software can be one such or a variety of for chrome browser, CPU-Z and GPU-Z in step 2;
CPU includes arithmetic unit and cache memory and realizes the data contacted between them, control and state in step 4 Bus;
Model is the whole duration that inspection software monitors detection device operation in step 5.
Embodiment one:
Image procossing evaluation board uses jeston tx2, and camera model is LI-IMX274, by camera and jeston tx2 phase Connect, then keyboard, mouse, USB set are grown up to be a useful person, AC adapter is connected with jeston tx2, display pass through video export Line is connected with jeston tx2, carries out system debug, enters system interface after the completion, opens chrome browser, operation camera shooting Head passes through chrome: //tracing/ gives observes the operating status of CPU and GPU and the time of operation, uses conventional hand Section processing, the CPU and GPU that leaves voluntarily distributes, the method for not using parted pattern, realization the result is that 9FPS.
Embodiment two:
According to embodiment one, pass through chrome: //tracing/ gives observe CPU and GPU operating status and operation when Between, as Figure 2-3, suitable node is chosen, model is cut off at this node, uses GPU to run before this node, this node Later using CPU run, realization the result is that 27FPS, comparative example one realize 3 times of speed-raising.
It can be seen that the cut-point manually to design a model, makes full use of the performance of CPU and GPU, it can be significantly Improve the speed of operation.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (6)

1. a kind of method that model segmentation improves arithmetic speed, it is characterised in that: the following steps are included:
Step 1: test board is connected respectively at camera and input-output equipment;
Step 2: mode test board enters system, opens inspection software and initializes;
Step 3: it opens camera and brings into operation;
Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;
Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU Operation is managed, latter half carries out operation using CPU.
2. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 1 Middle test board includes image procossing evaluation board.
3. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 1 Middle input-output equipment be that keyboard, mouse, display, USB set are grown up to be a useful person, AC adapter, video output cable are one such or It is a variety of.
4. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 2 Middle inspection software can be one such or a variety of for chrome browser, CPU-Z and GPU-Z.
5. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 4 Middle CPU includes arithmetic unit and cache memory and the bus for realizing the data contacted between them, control and state.
6. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 5 Middle model is the whole duration that inspection software monitors detection device operation.
CN201811375250.8A 2018-11-19 2018-11-19 A kind of method that model segmentation improves arithmetic speed Pending CN109522185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811375250.8A CN109522185A (en) 2018-11-19 2018-11-19 A kind of method that model segmentation improves arithmetic speed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811375250.8A CN109522185A (en) 2018-11-19 2018-11-19 A kind of method that model segmentation improves arithmetic speed

Publications (1)

Publication Number Publication Date
CN109522185A true CN109522185A (en) 2019-03-26

Family

ID=65776285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811375250.8A Pending CN109522185A (en) 2018-11-19 2018-11-19 A kind of method that model segmentation improves arithmetic speed

Country Status (1)

Country Link
CN (1) CN109522185A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298437A (en) * 2019-06-28 2019-10-01 Oppo广东移动通信有限公司 Separation calculation method, apparatus, storage medium and the mobile terminal of neural network
CN110490300A (en) * 2019-07-26 2019-11-22 苏州浪潮智能科技有限公司 A kind of operation accelerated method, apparatus and system based on deep learning
CN112114892A (en) * 2020-08-11 2020-12-22 北京奇艺世纪科技有限公司 Deep learning model obtaining method, loading method and selecting method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279445A (en) * 2012-09-26 2013-09-04 上海中科高等研究院 Computing method and super-computing system for computing task
CN104732221A (en) * 2015-03-30 2015-06-24 郑州师范学院 SIFT feature matching method based on OpenCL parallel acceleration
US20170024849A1 (en) * 2015-07-23 2017-01-26 Sony Corporation Learning convolution neural networks on heterogeneous cpu-gpu platform
CN107515736A (en) * 2017-07-01 2017-12-26 广州深域信息科技有限公司 A kind of method for accelerating depth convolutional network calculating speed on embedded device
CN107516311A (en) * 2017-08-08 2017-12-26 中国科学技术大学 A kind of corn breakage rate detection method based on GPU embedded platforms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279445A (en) * 2012-09-26 2013-09-04 上海中科高等研究院 Computing method and super-computing system for computing task
CN104732221A (en) * 2015-03-30 2015-06-24 郑州师范学院 SIFT feature matching method based on OpenCL parallel acceleration
US20170024849A1 (en) * 2015-07-23 2017-01-26 Sony Corporation Learning convolution neural networks on heterogeneous cpu-gpu platform
CN107515736A (en) * 2017-07-01 2017-12-26 广州深域信息科技有限公司 A kind of method for accelerating depth convolutional network calculating speed on embedded device
CN107516311A (en) * 2017-08-08 2017-12-26 中国科学技术大学 A kind of corn breakage rate detection method based on GPU embedded platforms

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DONGFEIG54321: "TensorFlow timeline模块获取图中每个节点的执行时间", 《CSDN博客》 *
EDGEEXPLORER: "使用NVIDIA TX2作为Azure IoT Edge设备实现物体检测", 《CSDN博客》 *
有些代码不应该被忘记: "梳理TensorFlow模型在Jetson TX2上进行inference的主要流程", 《CSDN博客》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298437A (en) * 2019-06-28 2019-10-01 Oppo广东移动通信有限公司 Separation calculation method, apparatus, storage medium and the mobile terminal of neural network
CN110298437B (en) * 2019-06-28 2021-06-01 Oppo广东移动通信有限公司 Neural network segmentation calculation method and device, storage medium and mobile terminal
CN110490300A (en) * 2019-07-26 2019-11-22 苏州浪潮智能科技有限公司 A kind of operation accelerated method, apparatus and system based on deep learning
CN110490300B (en) * 2019-07-26 2022-03-15 苏州浪潮智能科技有限公司 Deep learning-based operation acceleration method, device and system
CN112114892A (en) * 2020-08-11 2020-12-22 北京奇艺世纪科技有限公司 Deep learning model obtaining method, loading method and selecting method

Similar Documents

Publication Publication Date Title
Li et al. Few-shot cotton pest recognition and terminal realization
Hu et al. Attention-based multi-context guiding for few-shot semantic segmentation
CN106982359B (en) A kind of binocular video monitoring method, system and computer readable storage medium
CN106778682B (en) A kind of training method and its equipment of convolutional neural networks model
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109522185A (en) A kind of method that model segmentation improves arithmetic speed
CN109446889A (en) Object tracking method and device based on twin matching network
Wang et al. Synapse maintenance in the where-what networks
Lopes et al. Restricted Boltzmann machines and deep belief networks on multi-core processors
CN110909680A (en) Facial expression recognition method and device, electronic equipment and storage medium
CN105630648B (en) Data center's intelligent control method and system based on multidimensional data deep learning
CN108334878A (en) Video images detection method and apparatus
CN110008853A (en) Pedestrian detection network and model training method, detection method, medium, equipment
CN110363090A (en) Intelligent heart disease detection method, device and computer readable storage medium
Ding et al. Multiple lesions detection of fundus images based on convolution neural network algorithm with improved SFLA
Gobron et al. Retina simulation using cellular automata and GPU programming
Ghaleb et al. Skeleton-based explainable bodily expressed emotion recognition through graph convolutional networks
Khavalko et al. Image classification and recognition on the base of autoassociative neural network usage
Zhang et al. Target detection of banana string and fruit stalk based on YOLOv3 deep learning network
Ye et al. Broiler stunned state detection based on an improved fast region-based convolutional neural network algorithm
Choudhury et al. Quality evaluation in guavas using deep learning architectures: an experimental review
WO2017001885A2 (en) Method of generating a model of an object
Shen et al. Facial expression recognition based on bidirectional gated recurrent units within deep residual network
CN110414560A (en) A kind of autonomous Subspace clustering method for high dimensional image
CN108710958A (en) A kind of prediction health control method and device, computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190326