CN109522185A - A kind of method that model segmentation improves arithmetic speed - Google Patents
A kind of method that model segmentation improves arithmetic speed Download PDFInfo
- Publication number
- CN109522185A CN109522185A CN201811375250.8A CN201811375250A CN109522185A CN 109522185 A CN109522185 A CN 109522185A CN 201811375250 A CN201811375250 A CN 201811375250A CN 109522185 A CN109522185 A CN 109522185A
- Authority
- CN
- China
- Prior art keywords
- cpu
- gpu
- model
- arithmetic speed
- inspection software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3024—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3419—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
- G06F11/3423—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of models to divide the method for improving arithmetic speed, comprising the following steps: step 1: test board is connected respectively at camera and input-output equipment;Step 2: mode test board enters system, opens inspection software and initializes;Step 3: it opens camera and brings into operation;Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;Step 5: pass through analysis, choose suitable node, the entire operation of model is cut open to come, first half handles operation using GPU, and latter half carries out operation using CPU, first half is handled operation using GPU by the inventive method, the characteristics of latter half carries out operation using CPU, can be according to model calculation combines the characteristics of hardware, the cut-point manually to design a model, the performance of CPU and GPU are made full use of, to increase substantially the speed of operation.
Description
Technical field
The present invention relates to deep learning arithmetic speed technical field, specially a kind of model segmentation improves the side of arithmetic speed
Method.
Background technique
The concept of deep learning is derived from the research of artificial neural network, and the multilayer perceptron containing more hidden layers is exactly a kind of depth
Learning structure, deep learning, which forms more abstract high level by combination low-level feature, indicates attribute classification or feature, with discovery
The distributed nature of data indicates.
The concept of deep learning was proposed by Hinton et al. in 2006.Non- prison is proposed based on depth confidence network (DBN)
The layer-by-layer training algorithm of greed is superintended and directed, hope is brought to solve the relevant optimization problem of deep structure, then proposes multilayer autocoding
Device deep structure, the convolutional neural networks that furthermore LeCun et al. is proposed are first real multilayered structure learning algorithms, it is utilized
Spatial correlation reduces number of parameters to improve training performance.
Deep learning is a kind of based on the method for carrying out representative learning to data, an observation (such as width in machine learning
Image) various ways can be used to indicate, such as vector of each pixel intensity value, or be more abstractively expressed as a series of
Side, region of specific shape etc..And use certain specific representation methods be easier from example learning tasks (for example, face
Identification or human facial expression recognition), the benefit of deep learning is feature learning and the layered characteristic with non-supervisory formula or Semi-supervised
It extracts highly effective algorithm and obtains feature by hand to substitute.
Deep learning is a new field in machine learning research, and motivation is that foundation, simulation human brain are divided
The neural network of study is analysed, it imitates the mechanism of human brain to explain data, such as image, sound and text.
In the operation of deep learning, we are substantially all is handled using GPU, and CPU also will use, but the effect used
Rate is not high, in fact, the theory of their design of CPU and GPU is different.What GPU was good at is graphics class or is non-figure
The highly-parallel numerical value of shape class calculates, and GPU can accommodate the numerical value computational threads of thousands of a not logical relations, its advantage is
Parallel computation without logical relation data.What CPU was good at is as operating system, system software and general purpose application program is this kind of possesses
The program task of complicated order scheduling, circulation, branch, logic judgment and execution etc., its parallel advantage are program execution levels
Face, the complexity of programmed logic also defines the instruction-parallelism that program executes, the thread base that a concurrent programs up to a hundred execute
Originally it can't see.
Because neural network mainly carries out matrix operation, a large amount of matrix includes matrix, and the characteristics of matrix operation is letter
Substance is multiple, and just as common addition and subtraction, but this huge part of calculative matrix amount is handled using GPU, is natural
, but for the logical operation part of deep learning, still GPU is taken to handle, and it is just improper, it would therefore be highly desirable to a kind of improvement
Technology solve the problems, such as this in the presence of the prior art.
Summary of the invention
The purpose of the present invention is to provide a kind of models to divide the method for improving arithmetic speed, finds after study, although
The frame of present many deep learnings automatically can carry out processing operation by distribution GPU and CPU, still during operation
Obviously undesirable, there is no adequately mutual advantage and disadvantage being combined to carry out operation distribution, in the entire of deep learning reasoning operation
In process, what is be substantially carried out early period is the highly-parallel numerical value calculating of graphics class either non-graphic class, and the later period is substantially carried out
Logical operation processing, therefore, by the entire operation of model cut open come, first half using GPU handle operation, it is latter half of
Point operation is carried out using CPU, the characteristics of in this way can be according to model calculation the characteristics of combines hardware, the maximum speed for improving operation
Degree applies the speed run in localization so as to improve deep learning, to solve the problems mentioned in the above background technology.
To achieve the above object, the invention provides the following technical scheme: a kind of model divides the method for improving arithmetic speed,
The following steps are included:
Step 1: test board is connected respectively at camera and input-output equipment;
Step 2: mode test board enters system, opens inspection software and initializes;
Step 3: it opens camera and brings into operation;
Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;
Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU
Operation is managed, latter half carries out operation using CPU.
Preferably, test board includes image procossing evaluation board in the step 1.
Preferably, input-output equipment is keyboard in the step 1, mouse, display, USB set is grown up to be a useful person, exchange is adapted to
Device, video output cable are one such or a variety of.
Preferably, inspection software can be one such for chrome browser, CPU-Z and GPU-Z in the step 2
Or it is a variety of.
Preferably, CPU includes that arithmetic unit and cache memory and realizing contacts between them in the step 4
The bus of data, control and state.
Preferably, the whole duration that model runs in the step 5 for inspection software detection device.
Compared with prior art, the beneficial effects of the present invention are:
The entire operation of model to be cut open to come, first half handles operation using GPU, and latter half carries out operation using CPU,
In this way can be according to model calculation the characteristics of combine hardware the characteristics of, the cut-point manually to design a model, make full use of CPU and
The performance of GPU, the maximum speed for improving operation apply the speed run in localization so as to improve deep learning.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention
Fig. 2 is the time diagram of cpu node and corresponding consuming.
Fig. 3 is the time diagram of GPU node and corresponding consuming.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, the present invention provides a kind of technical solution: a kind of method that model segmentation improves arithmetic speed, including
Following steps:
Step 1: test board is connected respectively at camera and input-output equipment;
Step 2: mode test board enters system, opens inspection software and initializes;
Step 3: it opens camera and brings into operation;
Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;
Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU
Operation is managed, latter half carries out operation using CPU.
Test board includes image procossing evaluation board in step 1;
In step 1 input-output equipment be keyboard, mouse, display, USB set grow up to be a useful person, AC adapter, video output cable its
One of or it is a variety of;
Inspection software can be one such or a variety of for chrome browser, CPU-Z and GPU-Z in step 2;
CPU includes arithmetic unit and cache memory and realizes the data contacted between them, control and state in step 4
Bus;
Model is the whole duration that inspection software monitors detection device operation in step 5.
Embodiment one:
Image procossing evaluation board uses jeston tx2, and camera model is LI-IMX274, by camera and jeston tx2 phase
Connect, then keyboard, mouse, USB set are grown up to be a useful person, AC adapter is connected with jeston tx2, display pass through video export
Line is connected with jeston tx2, carries out system debug, enters system interface after the completion, opens chrome browser, operation camera shooting
Head passes through chrome: //tracing/ gives observes the operating status of CPU and GPU and the time of operation, uses conventional hand
Section processing, the CPU and GPU that leaves voluntarily distributes, the method for not using parted pattern, realization the result is that 9FPS.
Embodiment two:
According to embodiment one, pass through chrome: //tracing/ gives observe CPU and GPU operating status and operation when
Between, as Figure 2-3, suitable node is chosen, model is cut off at this node, uses GPU to run before this node, this node
Later using CPU run, realization the result is that 27FPS, comparative example one realize 3 times of speed-raising.
It can be seen that the cut-point manually to design a model, makes full use of the performance of CPU and GPU, it can be significantly
Improve the speed of operation.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding
And modification, the scope of the present invention is defined by the appended.
Claims (6)
1. a kind of method that model segmentation improves arithmetic speed, it is characterised in that: the following steps are included:
Step 1: test board is connected respectively at camera and input-output equipment;
Step 2: mode test board enters system, opens inspection software and initializes;
Step 3: it opens camera and brings into operation;
Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time;
Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU
Operation is managed, latter half carries out operation using CPU.
2. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 1
Middle test board includes image procossing evaluation board.
3. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 1
Middle input-output equipment be that keyboard, mouse, display, USB set are grown up to be a useful person, AC adapter, video output cable are one such or
It is a variety of.
4. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 2
Middle inspection software can be one such or a variety of for chrome browser, CPU-Z and GPU-Z.
5. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 4
Middle CPU includes arithmetic unit and cache memory and the bus for realizing the data contacted between them, control and state.
6. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 5
Middle model is the whole duration that inspection software monitors detection device operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811375250.8A CN109522185A (en) | 2018-11-19 | 2018-11-19 | A kind of method that model segmentation improves arithmetic speed |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811375250.8A CN109522185A (en) | 2018-11-19 | 2018-11-19 | A kind of method that model segmentation improves arithmetic speed |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109522185A true CN109522185A (en) | 2019-03-26 |
Family
ID=65776285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811375250.8A Pending CN109522185A (en) | 2018-11-19 | 2018-11-19 | A kind of method that model segmentation improves arithmetic speed |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522185A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110298437A (en) * | 2019-06-28 | 2019-10-01 | Oppo广东移动通信有限公司 | Separation calculation method, apparatus, storage medium and the mobile terminal of neural network |
CN110490300A (en) * | 2019-07-26 | 2019-11-22 | 苏州浪潮智能科技有限公司 | A kind of operation accelerated method, apparatus and system based on deep learning |
CN112114892A (en) * | 2020-08-11 | 2020-12-22 | 北京奇艺世纪科技有限公司 | Deep learning model obtaining method, loading method and selecting method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279445A (en) * | 2012-09-26 | 2013-09-04 | 上海中科高等研究院 | Computing method and super-computing system for computing task |
CN104732221A (en) * | 2015-03-30 | 2015-06-24 | 郑州师范学院 | SIFT feature matching method based on OpenCL parallel acceleration |
US20170024849A1 (en) * | 2015-07-23 | 2017-01-26 | Sony Corporation | Learning convolution neural networks on heterogeneous cpu-gpu platform |
CN107515736A (en) * | 2017-07-01 | 2017-12-26 | 广州深域信息科技有限公司 | A kind of method for accelerating depth convolutional network calculating speed on embedded device |
CN107516311A (en) * | 2017-08-08 | 2017-12-26 | 中国科学技术大学 | A kind of corn breakage rate detection method based on GPU embedded platforms |
-
2018
- 2018-11-19 CN CN201811375250.8A patent/CN109522185A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279445A (en) * | 2012-09-26 | 2013-09-04 | 上海中科高等研究院 | Computing method and super-computing system for computing task |
CN104732221A (en) * | 2015-03-30 | 2015-06-24 | 郑州师范学院 | SIFT feature matching method based on OpenCL parallel acceleration |
US20170024849A1 (en) * | 2015-07-23 | 2017-01-26 | Sony Corporation | Learning convolution neural networks on heterogeneous cpu-gpu platform |
CN107515736A (en) * | 2017-07-01 | 2017-12-26 | 广州深域信息科技有限公司 | A kind of method for accelerating depth convolutional network calculating speed on embedded device |
CN107516311A (en) * | 2017-08-08 | 2017-12-26 | 中国科学技术大学 | A kind of corn breakage rate detection method based on GPU embedded platforms |
Non-Patent Citations (3)
Title |
---|
DONGFEIG54321: "TensorFlow timeline模块获取图中每个节点的执行时间", 《CSDN博客》 * |
EDGEEXPLORER: "使用NVIDIA TX2作为Azure IoT Edge设备实现物体检测", 《CSDN博客》 * |
有些代码不应该被忘记: "梳理TensorFlow模型在Jetson TX2上进行inference的主要流程", 《CSDN博客》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110298437A (en) * | 2019-06-28 | 2019-10-01 | Oppo广东移动通信有限公司 | Separation calculation method, apparatus, storage medium and the mobile terminal of neural network |
CN110298437B (en) * | 2019-06-28 | 2021-06-01 | Oppo广东移动通信有限公司 | Neural network segmentation calculation method and device, storage medium and mobile terminal |
CN110490300A (en) * | 2019-07-26 | 2019-11-22 | 苏州浪潮智能科技有限公司 | A kind of operation accelerated method, apparatus and system based on deep learning |
CN110490300B (en) * | 2019-07-26 | 2022-03-15 | 苏州浪潮智能科技有限公司 | Deep learning-based operation acceleration method, device and system |
CN112114892A (en) * | 2020-08-11 | 2020-12-22 | 北京奇艺世纪科技有限公司 | Deep learning model obtaining method, loading method and selecting method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Few-shot cotton pest recognition and terminal realization | |
Hu et al. | Attention-based multi-context guiding for few-shot semantic segmentation | |
CN106982359B (en) | A kind of binocular video monitoring method, system and computer readable storage medium | |
CN106778682B (en) | A kind of training method and its equipment of convolutional neural networks model | |
CN109034210A (en) | Object detection method based on super Fusion Features Yu multi-Scale Pyramid network | |
CN109522185A (en) | A kind of method that model segmentation improves arithmetic speed | |
CN109446889A (en) | Object tracking method and device based on twin matching network | |
Wang et al. | Synapse maintenance in the where-what networks | |
Lopes et al. | Restricted Boltzmann machines and deep belief networks on multi-core processors | |
CN110909680A (en) | Facial expression recognition method and device, electronic equipment and storage medium | |
CN105630648B (en) | Data center's intelligent control method and system based on multidimensional data deep learning | |
CN108334878A (en) | Video images detection method and apparatus | |
CN110008853A (en) | Pedestrian detection network and model training method, detection method, medium, equipment | |
CN110363090A (en) | Intelligent heart disease detection method, device and computer readable storage medium | |
Ding et al. | Multiple lesions detection of fundus images based on convolution neural network algorithm with improved SFLA | |
Gobron et al. | Retina simulation using cellular automata and GPU programming | |
Ghaleb et al. | Skeleton-based explainable bodily expressed emotion recognition through graph convolutional networks | |
Khavalko et al. | Image classification and recognition on the base of autoassociative neural network usage | |
Zhang et al. | Target detection of banana string and fruit stalk based on YOLOv3 deep learning network | |
Ye et al. | Broiler stunned state detection based on an improved fast region-based convolutional neural network algorithm | |
Choudhury et al. | Quality evaluation in guavas using deep learning architectures: an experimental review | |
WO2017001885A2 (en) | Method of generating a model of an object | |
Shen et al. | Facial expression recognition based on bidirectional gated recurrent units within deep residual network | |
CN110414560A (en) | A kind of autonomous Subspace clustering method for high dimensional image | |
CN108710958A (en) | A kind of prediction health control method and device, computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |