CN109089097A

CN109089097A - A kind of object of focus choosing method based on VR image procossing

Info

Publication number: CN109089097A
Application number: CN201810989621.5A
Authority: CN
Inventors: 孟宪民; 李小波; 赵德贤
Original assignee: Hengxin Oriental Culture Ltd By Share Ltd
Current assignee: Hengxin Oriental Culture Ltd By Share Ltd
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2018-12-25

Abstract

This application discloses a kind of object of focus choosing methods based on VR image procossing, include the steps of determining that object of focus；High definition transmission object of focus simultaneously synchronizes non-high definition transmission background image.The application has reached reduction transmitted data on network amount, avoids transmission delay and meets the technical effect that client's viewing video is HD video.

Description

A kind of object of focus choosing method based on VR image procossing

Technical field

This application involves VR image transmission technology field more particularly to a kind of object of focus selections based on VR image procossing Method.

Background technique

Now to higher and higher to the requirement of resolution ratio in image transmitting process, especially in VR field of image transmission, because Binocular is needed to export for VR, so the demand for just having double sized image to export in transmission.According to current feasible scheme, If transmitting terminal then can be with the compressed video flowing of real-time Transmission using h264 decoding in receiving end using the compression of X264 coding Data, and transmitted data on network amount can be effectively reduced.But still there is transmission high definition, extra-high clear video council in network transmission field Cause delay aggravate the case where and can not instantaneity observation a possibility that, the transmission of video pressure of especially this 4K or 8K is then It is bigger.

In addition, the more high then clarity of corresponding field of video encoding compression ratio is lower, it is unable to satisfy and can passes immediately It is defeated but can allow client watch HD video a possibility that, be also unable to satisfy VR binocular output panoramic high-definition figure demand.

Summary of the invention

The application's is designed to provide a kind of object of focus choosing method based on VR image procossing, can reach reduction Transmitted data on network amount avoids transmission delay and meets the technical effect that client's viewing video is HD video.

In order to achieve the above objectives, the application provides a kind of object of focus choosing method based on VR image procossing, including such as Lower step: object of focus is determined；High definition transmission object of focus simultaneously synchronizes non-high definition transmission background image.

Preferably, the method for object of focus is determined specifically: receive the video that depth camera is captured or shot；Obtain view The image information and deep image information of each frame image in frequency；It is all right in each frame image to be listed using R-CNN algorithm As；Priority ranking, the high conduct object of focus of priority are carried out to each object.

Preferably, the minimum distance between camera carries out priority ranking according to each object, and minimum distance is smaller Priority it is higher.

Preferably, type according to each object carries out priority ranking, and type includes at least personage, animal, object, people The priority of object is higher than the priority of animal, and the priority of animal is higher than the priority of object.

Preferably, the method for all objects in image is listed using R-CNN algorithm specifically: generate every image more A candidate regions；To each candidate regions, feature is extracted using depth network；The feature extracted from each candidate regions is sent into each The SVM classifier of class is classified；After confirming candidate regions classification, further verifying is done by position refine；Pass through depth information Candidate regions are analyzed, are provided one by one with the object as focus possibility.

Preferably, the method for candidate regions is generated specifically: divide the image into multiple zonules；Check each zonule, Merge the neighboring community domain for meeting merging type, repeats to be merged into a regional location until whole image；Output is from being divided into Merge the region occurred in complete process as candidate regions.

Preferably, merging type includes: the first kind: zonule similar in color；Second class: zonule similar in texture； Third class: the small zonule of the gross area after merging；4th class: after merging, the gross area big cell of proportion in its BBOX Domain.

Preferably, merging and meeting the method for the zonule of merging type is the first kind, the second class, third class and the 4th class four Kind merging type is calculated simultaneously.

Preferably, small-region merging carries out in multiple color spaces, and it is empty that multiple color spaces include at least RGB color Between, hsv color space and Lab color space.

Preferably, high definition transmission object of focus and the method for synchronizing non-high definition transmission background image are as follows: calculate each focus The boundary of object；Calculate the average depth value, first edge profile and second edge profile of all object of focus；According to the first side Edge profile and second edge profile fusion treatment object of focus and background image, and complete to transmit.

The application realize have the beneficial effect that: can reach reduce transmitted data on network amount, avoid transmission delay and Meet client and watches the technical effect that video is HD video.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application can also be obtained according to these attached drawings other for those of ordinary skill in the art Attached drawing.

Fig. 1 is a kind of flow chart of embodiment of object of focus choosing method based on VR image procossing；

Fig. 2 is a kind of flow chart of embodiment of method for determining object of focus；

Fig. 3 is a kind of flow chart of embodiment of method that all objects in image are listed using R-CNN algorithm；

Fig. 4 is a kind of flow chart of embodiment of method for generating candidate regions；

Fig. 5 is that high definition transmits object of focus and synchronizes a kind of process of embodiment of method that non-high definition transmits background image Figure.

Specific embodiment

With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, those skilled in the art's every other embodiment obtained without making creative work, all Belong to the scope of protection of the invention.

As shown in Figure 1, the present invention provides a kind of object of focus choosing method based on VR image procossing, including walk as follows It is rapid:

S110: object of focus is determined.

Further, as shown in Fig. 2, the method for determining object of focus specifically:

S210: the video that depth camera is captured or shot is received.

Specifically, staff using depth camera capture or shoot video, and by the video be uploaded to computer into Row processing.Wherein, the video for capturing or shooting can be 4K video, but be not limited only to 4K video, can also be the view lower than 4K Frequency or the video higher than 4K, the present embodiment are preferably 4K video, high resolution.

S220: the color image information and deep image information of each frame image in video are obtained.

Specifically, color image information is two dimensional image, deep image information is that each pixel is corresponding on two dimensional image The distance between camera and real-world object point obtain deep image information using equipment such as Kinect or RealScene.

S230: all objects in each frame image are listed using R-CNN algorithm.

Specifically, R-CNN algorithm is existing deep learning algorithm, participated in when the object that can be identified is according to preparatory training Object type determines, such as the training of human of property, cat, dog, four class of chicken, then in computing object list, can only provide above-mentioned Four class objects.

Further, as shown in figure 3, the method for listing all objects in image using R-CNN algorithm specifically:

S310: every image is generated into multiple candidate regions.Specifically, candidate regions are 1K~2K.

Further, as shown in figure 4, the method for generating candidate regions specifically:

S410: multiple zonules are divided the image into.

S420: checking each zonule, merges the neighboring community domain for meeting merging type, repeats to merge until whole image At a regional location.

Specifically, merging type has following four kinds:

The first kind: block of cells similar in color (color histogram).Specifically, the range of the first kind is color space model It encloses.

Second class: zonule similar in texture (histogram of gradients).Specifically, two classes, which are that gradient is all similar, participates in meter It calculates.

Third class: the small zonule of the gross area after merging.Guarantee that the scale of union operation is more uniform, avoids an area Ge great Other zonules " are eaten up " successively in domain.

Specifically, merging calculating according to 1K~2K candidate regions.Equipped with region a-b-c-d-e-f-g-h.Preferably Merging mode is: ab-cd-ef-gh- > abcd-efgh- > abcdefgh.

4th class: after merging, the gross area big cell of proportion in its BBOX (BoundingBox, image size) Domain.

Preferably, in the method for the generation candidate regions of the present embodiment, the first kind, the second class, third class and the 4th class are needed It to be calculated simultaneously.The regular shape for merging rear region is to be merged automatically according to algorithm and calculated.Provincial characteristics after merging Can be directly from subregion feature calculation, fast speed.(including color histogram, texture are straight for provincial characteristics after merging Fang Tu, area and position) can be directly from the provincial characteristics calculating of subregion, fast speed.

S430: it exports from the region for merging and occurring in complete process is divided into as candidate regions.

Specifically, referring to all cells including being partitioned into from the region for merging and occurring in complete process is divided into Domain, zonule merge in all combined region and merged later region.For example, being equipped with region a-b-c-d- e-f-g-h.Preferred merging mode is: ab-cd-ef-gh- > abcd-efgh- > abcdefgh, the then candidate regions exported have area Domain a, region b, region c, region d, region e, region f, region g, region h, region ab, region cd, region ef, region gh, area Domain abcd, region efgh, region abcdefgh.

Further, to avoid not omitting candidate regions, aforesaid operations carry out (RGB color simultaneously in multiple color spaces Space, hsv color space, Lab color space etc.).In a color space, the first kind, the second class, third class and are used Four classes merge again after carrying out various combination.The region after the repeat region in all areas after removal merging is as candidate Area's output.

Continuing with referring to Fig. 3, S320: to each candidate regions, extracting feature using depth network.

Specifically, depth network is existing feature point extraction technology, its characteristic value can be calculated to a region, It is carried out on tri- color spaces of RGB, HSV, Lab respectively when calculating.

Further, first preprocessed data is needed before extracting feature.

Specifically, candidate regions are normalized into same size, this is preferably dimensioned to be 227 × 227.

S330: the SVM classifier that the feature extracted from each candidate regions is sent into every one kind is classified.

Specifically, class is manually demarcated.For example, one class of setting is behaved, the image of this feature of people will be manually calibrated It is sent into depth network to be trained, the object that training result is demarcated is also that people is this kind of, after training, if scheming when use As the similar feature of upper appearance then determines that, for people's this kind, the object that do not demarcate will not be then correctly validated.

S340: after confirmation candidate regions classification, further verifying is done by position refine.

Specifically, carrying out refine using candidate frame of the linear ridge regression device to the candidate regions of every one kind.Depth network pool5 4096 dimensional features of layer export as the zooming and panning in the direction xy, and training sample is determined as in the candidate frame of this class and true value weight Folded area is greater than 0.6 candidate frame.

S350: analyzing candidate regions by depth information, is provided one by one with the object as focus possibility.

Specifically, requiring to calculate possible object before candidate regions merge and after merging, to pair with same possibility The merging candidate regions of elephant are recalculated, and all objects having as focus possibility are listed.

Continuing with referring to fig. 2, after determining all objects in a width figure, priority row S240: is carried out to each object Sequence, the high conduct object of focus of priority.

Further, as one embodiment, the minimum distance between each object and camera is recorded, by each object Classification and ordination is carried out according to distance value from the near to the distant, the object nearest from camera becomes the highest priority of object of focus.

Specifically, using preparatory trained depth void seat model calculate all objects in each image and camera it Between distance.Deep learning model is that will mark focus pair by being manually labeled to the object of focus in each image The data of elephant input in existing deep learning model for training.For example, in the first artificial mark: the first object distance is taken the photograph As 10 meters of head, all animals from 10 meters of camera over, then the first personage is object of focus.In the second artificial mark: The first object is apart from 10 meters of camera, and unknown form biology is apart from 5 meters of camera, then unknown form biology is object of focus.? During third manually marks: the first object is apart from 10 meters of camera, and the first bear and the second bear are respectively apart from 6 meters and 8 meters of camera, and One personage is object of focus.

As another embodiment, classification row can also be carried out according to the type of personage, animal, object for each object Sequence.Preferably, the priority of personage is higher than the priority of animal, and the priority of animal is higher than the priority of object.Priority is high Conduct object of focus.

Preferably, when determining object of focus, first judge the type of object, then judge the distance of object distance camera.Tool Body, in one image, no matter the distance of object distance camera is how many, and personage, unknown form biology and animal are simultaneously When appearance, personage is higher than unknown form biology as the priority of object of focus, and unknown form biology is as the excellent of object of focus First grade is higher than animal.

Continuing with referring to Fig. 1, S120: high definition transmission object of focus simultaneously synchronizes non-high definition transmission background image.

Further, as shown in figure 5, high definition transmits object of focus and synchronizes the method for non-high definition transmission background image such as Under:

S510: the boundary of each object of focus is calculated.

Specifically, calculate the boundary of the object of focus by the depth information of each object of focus, depth information value for It is to have an apparent excessive boundary for some object, depth information value can be obtained by depth network trained in advance. For example, when people's object distance is from camera 10m, from being exactly between camera 9.8m to 10.2m around personage The depth information of this people, if it exceeds this threshold value is exactly the scenes such as the mountain forest building of other objects or distant place.

S520: the average depth value, first edge profile and second edge profile of all object of focus are calculated.

Specifically, an average depth value is calculated according to the depth information of all object of focus, in this depth value model Enclose it is interior each pixel is carried out calculating threshold value using existing depth network be greater than the boundary of 20cm range as object Boundary, and first edge profile and second edge profile are calculated, first edge profile is the edge wheel of approximate object of focus Exterior feature, second edge profile are greater than first edge profile and first edge profile are enclosed in second edge profile.

S530: it according to first edge profile and second edge profile fusion treatment object of focus and background image, and completes Transmission.

Specifically, video and object of focus synchronous transfer that depth camera is captured or shot give VR client.First Original image is rendered into a non-high definition figure as background image in VR client, then corresponds to focus pair originally on background image The object of focus figure that one layer of high definition is rendered on the region of elephant, is gradually melted to second edge profile according to first edge profile It closes, integration region is formed, with comfortable visual impression when image is shown after guarantee processing.

Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.Obviously, those skilled in the art can be to the application Various modification and variations are carried out without departing from spirit and scope.If in this way, these modifications and variations of the application Belong within the scope of the claim of this application and its equivalent technologies, then the application is also intended to encompass these modification and variations and exists It is interior.

Claims

1. a kind of object of focus choosing method based on VR image procossing, which comprises the steps of:

Determine object of focus；

High definition transmission object of focus simultaneously synchronizes non-high definition transmission background image.

2. object of focus choosing method according to claim 1, which is characterized in that the method for determining object of focus is specific Are as follows:

Receive the video that depth camera is captured or shot；

Obtain the image information and deep image information of each frame image in video；

All objects in each frame image are listed using R-CNN algorithm；

Priority ranking, the high conduct object of focus of priority are carried out to each object.

3. object of focus choosing method according to claim 2, which is characterized in that according to each object between camera Minimum distance carry out priority ranking, the smaller priority of minimum distance is higher.

4. object of focus choosing method according to claim 2, which is characterized in that type according to each object carries out excellent First grade sequence, type include at least personage, animal, object, and the priority of personage is higher than the priority of animal, the priority of animal Higher than the priority of object.

5. object of focus choosing method according to claim 2, which is characterized in that listed in image using R-CNN algorithm All objects method specifically:

Every image is generated into multiple candidate regions；

To each candidate regions, feature is extracted using depth network；

The SVM classifier that the feature extracted from each candidate regions is sent into every one kind is classified；

After confirming candidate regions classification, further verifying is done by position refine；

Candidate regions are analyzed by depth information, are provided one by one with the object as focus possibility.

6. object of focus choosing method according to claim 5, which is characterized in that the method for generating candidate regions specifically:

Divide the image into multiple zonules；

It checks each zonule, merges the neighboring community domain for meeting merging type, repeat to be merged into an area until whole image Domain position；

It exports from the region for merging and occurring in complete process is divided into as candidate regions.

7. object of focus choosing method according to claim 6, which is characterized in that the merging type includes:

The first kind: zonule similar in color；

Second class: zonule similar in texture；

Third class: the small zonule of the gross area after merging；

4th class: after merging, the gross area big zonule of proportion in its BBOX.

8. object of focus choosing method according to claim 7, which is characterized in that merge the zonule for meeting merging type Method be the first kind, the second class, four kinds of merging type of third class and the 4th class simultaneously calculated.

9. object of focus choosing method according to claim 6, which is characterized in that small-region merging is in multiple color spaces Middle progress, multiple color spaces include at least RGB color, hsv color space and Lab color space.

10. object of focus choosing method according to claim 6, which is characterized in that high definition transmits object of focus and synchronization The method of non-high definition transmission background image are as follows:

Calculate the boundary of each object of focus；

Calculate the average depth value, first edge profile and second edge profile of all object of focus；

According to first edge profile and second edge profile fusion treatment object of focus and background image, and complete to transmit.