CN109522185A

CN109522185A - A kind of method that model segmentation improves arithmetic speed

Info

Publication number: CN109522185A
Application number: CN201811375250.8A
Authority: CN
Inventors: ***; 邹捷
Original assignee: Jiangsu Radium Intelligent Technology Co Ltd
Current assignee: Jiangsu Radium Intelligent Technology Co Ltd
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2019-03-26

Abstract

The invention discloses a kind of models to divide the method for improving arithmetic speed, comprising the following steps: step 1: test board is connected respectively at camera and input-output equipment；Step 2: mode test board enters system, opens inspection software and initializes；Step 3: it opens camera and brings into operation；Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time；Step 5: pass through analysis, choose suitable node, the entire operation of model is cut open to come, first half handles operation using GPU, and latter half carries out operation using CPU, first half is handled operation using GPU by the inventive method, the characteristics of latter half carries out operation using CPU, can be according to model calculation combines the characteristics of hardware, the cut-point manually to design a model, the performance of CPU and GPU are made full use of, to increase substantially the speed of operation.

Description

A kind of method that model segmentation improves arithmetic speed

Technical field

The present invention relates to deep learning arithmetic speed technical field, specially a kind of model segmentation improves the side of arithmetic speed Method.

Background technique

The concept of deep learning is derived from the research of artificial neural network, and the multilayer perceptron containing more hidden layers is exactly a kind of depth Learning structure, deep learning, which forms more abstract high level by combination low-level feature, indicates attribute classification or feature, with discovery The distributed nature of data indicates.

The concept of deep learning was proposed by Hinton et al. in 2006.Non- prison is proposed based on depth confidence network (DBN) The layer-by-layer training algorithm of greed is superintended and directed, hope is brought to solve the relevant optimization problem of deep structure, then proposes multilayer autocoding Device deep structure, the convolutional neural networks that furthermore LeCun et al. is proposed are first real multilayered structure learning algorithms, it is utilized Spatial correlation reduces number of parameters to improve training performance.

Deep learning is a kind of based on the method for carrying out representative learning to data, an observation (such as width in machine learning Image) various ways can be used to indicate, such as vector of each pixel intensity value, or be more abstractively expressed as a series of Side, region of specific shape etc..And use certain specific representation methods be easier from example learning tasks (for example, face Identification or human facial expression recognition), the benefit of deep learning is feature learning and the layered characteristic with non-supervisory formula or Semi-supervised It extracts highly effective algorithm and obtains feature by hand to substitute.

Deep learning is a new field in machine learning research, and motivation is that foundation, simulation human brain are divided The neural network of study is analysed, it imitates the mechanism of human brain to explain data, such as image, sound and text.

In the operation of deep learning, we are substantially all is handled using GPU, and CPU also will use, but the effect used Rate is not high, in fact, the theory of their design of CPU and GPU is different.What GPU was good at is graphics class or is non-figure The highly-parallel numerical value of shape class calculates, and GPU can accommodate the numerical value computational threads of thousands of a not logical relations, its advantage is Parallel computation without logical relation data.What CPU was good at is as operating system, system software and general purpose application program is this kind of possesses The program task of complicated order scheduling, circulation, branch, logic judgment and execution etc., its parallel advantage are program execution levels Face, the complexity of programmed logic also defines the instruction-parallelism that program executes, the thread base that a concurrent programs up to a hundred execute Originally it can't see.

Because neural network mainly carries out matrix operation, a large amount of matrix includes matrix, and the characteristics of matrix operation is letter Substance is multiple, and just as common addition and subtraction, but this huge part of calculative matrix amount is handled using GPU, is natural , but for the logical operation part of deep learning, still GPU is taken to handle, and it is just improper, it would therefore be highly desirable to a kind of improvement Technology solve the problems, such as this in the presence of the prior art.

Summary of the invention

The purpose of the present invention is to provide a kind of models to divide the method for improving arithmetic speed, finds after study, although The frame of present many deep learnings automatically can carry out processing operation by distribution GPU and CPU, still during operation Obviously undesirable, there is no adequately mutual advantage and disadvantage being combined to carry out operation distribution, in the entire of deep learning reasoning operation In process, what is be substantially carried out early period is the highly-parallel numerical value calculating of graphics class either non-graphic class, and the later period is substantially carried out Logical operation processing, therefore, by the entire operation of model cut open come, first half using GPU handle operation, it is latter half of Point operation is carried out using CPU, the characteristics of in this way can be according to model calculation the characteristics of combines hardware, the maximum speed for improving operation Degree applies the speed run in localization so as to improve deep learning, to solve the problems mentioned in the above background technology.

To achieve the above object, the invention provides the following technical scheme: a kind of model divides the method for improving arithmetic speed, The following steps are included:

Step 1: test board is connected respectively at camera and input-output equipment；

Step 2: mode test board enters system, opens inspection software and initializes；

Step 3: it opens camera and brings into operation；

Step 4: GPU and CPU respectively individual runing time are checked by inspection software, and analyze mutual runing time；

Step 5: by analysis, choosing suitable node, and the entire operation of model is cut open to come, and first half is using at GPU Operation is managed, latter half carries out operation using CPU.

Preferably, test board includes image procossing evaluation board in the step 1.

Preferably, input-output equipment is keyboard in the step 1, mouse, display, USB set is grown up to be a useful person, exchange is adapted to Device, video output cable are one such or a variety of.

Preferably, inspection software can be one such for chrome browser, CPU-Z and GPU-Z in the step 2 Or it is a variety of.

Preferably, CPU includes that arithmetic unit and cache memory and realizing contacts between them in the step 4 The bus of data, control and state.

Preferably, the whole duration that model runs in the step 5 for inspection software detection device.

Compared with prior art, the beneficial effects of the present invention are:

The entire operation of model to be cut open to come, first half handles operation using GPU, and latter half carries out operation using CPU, In this way can be according to model calculation the characteristics of combine hardware the characteristics of, the cut-point manually to design a model, make full use of CPU and The performance of GPU, the maximum speed for improving operation apply the speed run in localization so as to improve deep learning.

Detailed description of the invention

Fig. 1 is flow diagram of the present invention

Fig. 2 is the time diagram of cpu node and corresponding consuming.

Fig. 3 is the time diagram of GPU node and corresponding consuming.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

As shown in Figure 1, the present invention provides a kind of technical solution: a kind of method that model segmentation improves arithmetic speed, including Following steps:

Step 3: it opens camera and brings into operation；

Test board includes image procossing evaluation board in step 1；

In step 1 input-output equipment be keyboard, mouse, display, USB set grow up to be a useful person, AC adapter, video output cable its One of or it is a variety of；

Inspection software can be one such or a variety of for chrome browser, CPU-Z and GPU-Z in step 2；

CPU includes arithmetic unit and cache memory and realizes the data contacted between them, control and state in step 4 Bus；

Model is the whole duration that inspection software monitors detection device operation in step 5.

Embodiment one:

Image procossing evaluation board uses jeston tx2, and camera model is LI-IMX274, by camera and jeston tx2 phase Connect, then keyboard, mouse, USB set are grown up to be a useful person, AC adapter is connected with jeston tx2, display pass through video export Line is connected with jeston tx2, carries out system debug, enters system interface after the completion, opens chrome browser, operation camera shooting Head passes through chrome: //tracing/ gives observes the operating status of CPU and GPU and the time of operation, uses conventional hand Section processing, the CPU and GPU that leaves voluntarily distributes, the method for not using parted pattern, realization the result is that 9FPS.

Embodiment two:

According to embodiment one, pass through chrome: //tracing/ gives observe CPU and GPU operating status and operation when Between, as Figure 2-3, suitable node is chosen, model is cut off at this node, uses GPU to run before this node, this node Later using CPU run, realization the result is that 27FPS, comparative example one realize 3 times of speed-raising.

It can be seen that the cut-point manually to design a model, makes full use of the performance of CPU and GPU, it can be significantly Improve the speed of operation.

It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims

1. a kind of method that model segmentation improves arithmetic speed, it is characterised in that: the following steps are included:

Step 3: it opens camera and brings into operation；

2. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 1 Middle test board includes image procossing evaluation board.

3. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 1 Middle input-output equipment be that keyboard, mouse, display, USB set are grown up to be a useful person, AC adapter, video output cable are one such or It is a variety of.

4. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 2 Middle inspection software can be one such or a variety of for chrome browser, CPU-Z and GPU-Z.

5. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 4 Middle CPU includes arithmetic unit and cache memory and the bus for realizing the data contacted between them, control and state.

6. a kind of method that model segmentation improves arithmetic speed according to claim 1, it is characterised in that: the step 5 Middle model is the whole duration that inspection software monitors detection device operation.