CN102034267A - Three-dimensional reconstruction method of target based on attention - Google Patents

Three-dimensional reconstruction method of target based on attention Download PDF

Info

Publication number
CN102034267A
CN102034267A CN 201010574274 CN201010574274A CN102034267A CN 102034267 A CN102034267 A CN 102034267A CN 201010574274 CN201010574274 CN 201010574274 CN 201010574274 A CN201010574274 A CN 201010574274A CN 102034267 A CN102034267 A CN 102034267A
Authority
CN
China
Prior art keywords
video
frame
saliency map
attention
dimensional reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010574274
Other languages
Chinese (zh)
Inventor
徐常胜
肖宪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN 201010574274 priority Critical patent/CN102034267A/en
Publication of CN102034267A publication Critical patent/CN102034267A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a three-dimensional reconstruction method of a target based on attention, comprising the steps: step S1, dividing video for three-dimensional reconstruction into video frames, analyzing the visual attention distribution in the video frames in the static state, the position and the dynamic state and obtaining significance level diagrams; fusing the significance level diagrams in the static state, the position and the dynamic state so as to obtain the significance level diagrams based on the video of each video frame, wherein the significant areas described by the significance level diagrams are interested areas in three-dimensional reconstruction; step S2: utilizing the concept global features to cluster all the video frames, picking out a candidate keyframe set according to the significance level diagram generated by each video frame, and finally extracting the video keyframes for three-dimensional reconstruction by analyzing geometric constraint and visual constraint; and step S3: using the video keyframes and the significance level diagram corresponding to the video keyframes to carry out three-dimensional reconstruction only on the significant areas in the video frames so as to obtain accurate three-dimensional models on the interested areas and increase the reconstruction speed.

Description

Object three-dimensional rebuilding method based on attention rate
Technical field
The invention belongs to computer vision, Flame Image Process and multimedia analysis technical field relate to the object three-dimensional rebuilding method based on attention rate.
Background technology
Along with the development of digital picture, high-quality video becomes abundanter.Because the three-dimensional reconstruction based on video passes through to use the rich video data, can improve set precision and visual quality, therefore becomes a very popular research topic, in computer vision, Flame Image Process and multimedia analysis.
In general, the three-dimensional reconstruction system based on video can be divided into two classes: based on the system of non-demarcation with based on the system of demarcating certainly.System based on non-demarcation needs image and camera parameters to carry out three-dimensional reconstruction, for example block-based stereoscopic approach (PMVS) of looking more, and it retrains the three-dimensional structure of recovering object or scene by strengthening indicative of local optical consistance and overall vision.At first estimate camera parameters and then recover three-dimensional point cloud based on the system of demarcating certainly by camera self-calibration algorithm.But current approach only provides the three-dimensional reconstruction of overall scenario, and we often only pay close attention to the zone that those attract our notice.Such method has been wasted too many calculated amount and has been rebuild those non-area-of-interests, and three-dimensional model can not be given prominence to those area-of-interests.
People often pay close attention to those visions regional significantly, and the vision attention analysis can obtain the vision marking area.Computer vision has been learnt and has been widely used in the vision attention analysis, and artificial intelligence and multimedia are handled.Most of previous work concentrates on the analysis of still image, mainly uses static information.Current, the video attention-degree analysis has attracted more concern.Main method has in conjunction with static and position saliency map and obtains area-of-interest in the key frame.Except static and position attention rate, dynamically attention rate has caused people's attention equally, and is widely used in the area-of-interest-detection based on space time information.There are a lot of methods can be used for obtaining motion vector, for example optical flow method.But the estimation of the motion vector under the motion cameras condition remains a challenging problem, and is not enough from the dynamic attention rate of beholder's angle analysis only.
Summary of the invention
For the unsafty problem of the degree of accuracy that solves the prior art three-dimensional reconstruction, the method that the objective of the invention is to propose to detect in a space-time attention rate zone strengthens the three-dimensional reconstruction based on video, and a kind of object three-dimensional rebuilding method based on attention rate is provided for this reason.
For achieving the above object, the technical scheme of the object three-dimensional rebuilding method based on attention rate provided by the invention comprises: improve the effect of three-dimensional reconstruction and accelerate the speed of three-dimensional reconstruction by the area-of-interest in the analysis video frame, comprise that step is as follows:
Step S1: the Video Segmentation that will be used for three-dimensional reconstruction is a frame of video, and comes the vision attention the analysis video frame to distribute from static state, position and dynamic three aspects, and obtains static state, position and the dynamic saliency map of its correspondence; The saliency map that merges static state, position and dynamic tripartite surface analysis is to obtain the saliency map based on video of each frame of video, and the described marking area of saliency map is the area-of-interest in the three-dimensional reconstruction;
Step S2: utilize the summary global characteristics to come all frame of video of cluster, and select a candidate's key frame set according to the saliency map that each frame of video produces, finally, extract the key frame of video that is used for three-dimensional reconstruction by geometrical constraint and the about beam analysis of vision;
Step S3: use key frame of video and its corresponding saliency map, only the marking area in the frame of video is carried out three-dimensional reconstruction, to obtain the accurate three-dimensional model on area-of-interest and to accelerate reconstruction speed.
Wherein, the vision attention of described analysis video comprises: static attention-degree analysis, position attention-degree analysis, dynamically attention-degree analysis and attention rate fusion;
For each frame of video, use based on contrast with based on the method that information theory combines and carry out static attention-degree analysis, obtain static saliency map;
For each frame of video,, obtain the position saliency map from level, vertical and radiate the motion of describing video camera in three aspects and use mould preparation plate coupling and carry out the position attention-degree analysis;
For adjacent video frames, carry out dynamic attention-degree analysis from video spectators and two aspects of video capture person, obtain the dynamic saliency map of preceding frame in adjacent two frames;
Static saliency map, position saliency map and dynamic saliency map for each frame of video that obtains, use the mode of dynamic fusion to carry out the attention rate fusion, according to static saliency map and dynamically the relation between the average of saliency map calculate when merging separately weight, and finally obtain the visual saliency figure after the fusion of each frame of video.
Wherein, the described extraction step that is used for the key frame of video of three-dimensional reconstruction comprises as follows:
Step S21: use all frames of summary global characteristics descriptor cluster earlier to k cluster classification;
Step S22:, obtain the class saliency map of this classification by the average of calculating all saliency maps in this classification for each cluster classification;
Step S23: calculate the saliency map of frame in each cluster and the distance between the class saliency map, and from each cluster classification, select with the class saliency map and gather as candidate's key frame apart from 10% image of minimum;
Step S24: will constitute a frame combination from any k frame in the set of candidate's key frame, if they are from different classifications, according to geometrical constraint and all frame combination and the combinations of final decision key frame of vision constraint ordering.
It is wherein, described that only the marking area in the frame of video to be carried out the step of three-dimensional reconstruction as follows:
Step S31: use the camera parameters that recovers key frame by the method for exercise recovery structure automatically; Then, in each key frame, detect son and detect angle point with difference of Gaussian and Harris; Area-of-interest to each key frame is described by the value of visual saliency; By the frame saliency map, deletion detects is distributed in feature outside the area-of-interest; At last, being distributed in feature in the area-of-interest is provided and recovers three-dimensional information;
Step S32: use the limit constraint between two width of cloth images that the picture feature that is distributed in the area-of-interest is carried out characteristic matching, distribute, obtain the initial matching piece thereby in marking area, form sparse piece,
Step S33: repeat n time the initial matching piece spread expansion towards periphery, and obtain dense piece and distribute;
Step S34:,, realize the three-dimensional reconstruction that attention rate strengthens to the piece of dense n elimination of piece distribution repetition matching error according to the vision constraint.
Beneficial effect of the present invention: the present invention is by carrying out the vision attention analysis to each frame of video, in each frame of video, obtained comparatively accurate area-of-interest, and by to frame of video based on the cluster of global characteristics and based on the key frame of video extraction of vision constraint and geometrical constraint, obtain helping the area-of-interest in three-dimensional reconstruction key frame of video and the frame of video.Feature in the area-of-interest in the key frame of video is carried out three-dimensional reconstruction, obtained accurate three-dimensional reconstruction result, and improved the speed of three-dimensional reconstruction.
The experiment of the indoor and outdoors of the present invention in true environment has proved that method of the present invention has higher degree of accuracy and the counting yield of Geng Gao.
Description of drawings
Fig. 1 the present invention is based on the object three-dimensional reconstruction frame diagram of vision attention
Fig. 2 is the vision attention analysis result that is described under the indoor environment.
Fig. 3 is described in the result that vision attention is analyzed under the outdoor environment.
Fig. 4 is that key frame of video extracts the result under the indoor environment.
Fig. 5 is that key frame of video extracts the result under the outdoor environment.
Fig. 6 is the example that indoor scene is rebuild.
Fig. 7 is the example that outdoor scene is rebuild.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Consider the widespread use of vision attention analysis on area-of-interest detects, the present invention proposes the method that detects in a space-time attention rate zone and strengthen three-dimensional reconstruction based on video.Method of the present invention adapts to the characteristic based on the three-dimensional reconstruction of video: the unexpected appearance of some object in unknown camera motion and the video (for example flying bird, pedestrian, the vehicle of passing by etc.).By with traditional three-dimensional rebuilding method relatively, method of the present invention can obtain three-dimensional model and lower computing consumption more accurately.Use a computer among the present invention all under Windows XP operating system, the hardware device condition is a processor: Duo double-core 2.2G, internal memory 2G.Structural drawing of the present invention shows the object three-dimensional reconstruction frame diagram based on vision attention in Fig. 1, comprise three parts: 1) based on the vision attention analysis of video, 2) key frame of video extracts 3) the video three-dimensional reconstruction that strengthens of region-of-interest, wherein:
1 vision attention analysis based on video
Vision attention analysis based on video mainly comprises four parts: static attention-degree analysis, position attention-degree analysis, dynamically attention-degree analysis and attention rate fusion.Wherein, static attention-degree analysis mainly is the analysis at the object of the static state that can cause people's visual attention location in the frame of video; The position attention-degree analysis mainly is in the frame of video, the analysis of the visual attention location that causes by the position, and also shift along with the variation of video camera the position of visual attention location; Dynamically attention-degree analysis mainly is in the adjacent video frames, and the object of motion can cause the analysis of visual attention location, and by the exercise intensity analysis to each pixel in the frame of video, acquisition can cause the dynamic saliency map of visual attention location; After visual attention location was analyzed in static state, position and dynamic three aspects, the result of tripartite surface analysis was merged in the present invention, obtains final visual attention location analysis respectively.
1.1 static attention-degree analysis
Stationary body can attract people's attention, so-called static attention rate that Here it is.Introducing center ring based on the attention-degree analysis of contrast provides human visual system (HVS) with the feature contrast around the notion of structure.Be fixed against such prerequisite based on information-theoretical method: visual attention location be fixed against maximized intelligence sample fully.Contrast and intelligence sample are two factors calculating significance.The present invention is merged and to be calculated static saliency map based on the method for contrast with based on information-theoretical method, as formula (1):
Map static(x,y)=Con(x,y)×ID(x,y) (1)
Here (x, the static significance value of y) locating is Map at point Static(x, y), the contrast after the normalization be characterized as Con (x, y) and information characteristics be ID (x, y).
1.2 position attention-degree analysis
The present invention describes the motion of video camera and uses integral mold plate coupling (Integral Template Matching) technology from level (H), vertical (V) and radiation (R) three aspects.The model of application level (H), vertical (V) and radiation (R) 3 parameters, using, three formula calculate video cameras at horizontal Map H(i, j), vertical Map V(i is j) with radiation Map R(i, the j) exercise intensity of three directions, as formula (2-4):
Map H ( i , j ) = max ( 0,1 - | j - width / 2 - k H × H | width / 2 ) - - - ( 2 )
Map V ( i , j ) = max ( 0,1 | i - height / 2 - k V × V | height / 2 ) - - - ( 3 )
Map R ( i , j ) = 1 - r / r max R &GreaterEqual; 0 - k r &times; r / r max R < 0 - - - ( 4 )
Here, i, j are the positions of pixel, and the r remarked pixel is put the distance of frame center, r MaxIt is the maximal value of r.k H, k VAnd k rIt is constant; H represents the tangential movement of video camera, and V represents the vertical movement of video camera, and R represents the radiation campaign of video camera, and max represents to get maximal value, and width represents the pixel wide of current video frame, and height represents the pixels tall of current video frame.
The position saliency map Map of final video camera LocBe expressed as followsin (formula (5)):
Map loc=Map H+Map V+Map R (5)
Map wherein HExpression video camera exercise intensity in the horizontal direction, Map VThe expression video camera is at the exercise intensity of vertical direction, Map RThe intensity that rotatablely moves of expression video camera.
1.3 dynamic attention-degree analysis
Method of the present invention is from video spectators and two dynamic attention rates of aspect analysis of video capture person.From spectators' angle, the present invention has analyzed the attention which zone attracts people more.From photographer's angle, which zone the present invention studies is that photographer wants to write down.
In the method for the invention, under motion cameras, the zone that causes photographer and spectators' concern simultaneously is dynamic region-of-interest.In addition, dynamically the exercise intensity of region-of-interest neither maximum neither minimum and also visual saliency and exercise intensity be inversely proportional to.
The present invention detects exercise intensity with optical flow method, and represents exercise intensity with UV.The average of the exercise intensity of each picture frame and standard deviation are important expressions.Dynamic saliency map Map Motion(x y) is expressed as:
Map motion ( x , y ) = 0 UV ( x , y ) > Mean + &delta; &times; SD 0 UV ( x , y ) < max ( Mean - &delta; &times; SD , UB ) 1 - UV ( x , y ) Others - - - ( 6 )
Wherein Mean and SD represent average and standard deviation, and δ is a loss coefficient, and UB is the upper bound of optical flow method to non-texture complex region flase drop at a distance.
1.4 attention rate merges
Static saliency map has represented to attract the stationary body of spectators' interest.The position saliency map has been described the distribution of human vision susceptibility.The vision marking area that has high visual sensitivity is paid close attention to than the easier acquisition in the zone of low visual sensitivity.Therefore, by multiply by the position saliency map with static saliency map, the present invention has obtained the static saliency map that the position strengthens.Dynamically saliency map has been described in video, which easier attraction human visual system that moves.
The present invention proposes a dynamic fusion algorithm and static state, dynamically the weight of significance is by the ratio decision between the average static and dynamically saliency map.Final saliency map Map FusionBe expressed as follows:
Map fusion=Map motion×λ+Map loc·×Map static×(1-λ) (7)
λ=Mean motion/(Mean motion+Mean static) (8)
λ is the weight of dynamic attention rate, Map MotionBe dynamic saliency map, Map LocBe the position saliency map, Map StaticBe static saliency map, Mean StaticAnd Mean MotionIt is the average of static and dynamic saliency map.
2 key frame of video extract
For the selecting video frame carries out three-dimensional reconstruction, the present invention proposes a new key frame of video extraction algorithm, is divided into three parts.The present invention uses all frames of summary global characteristics (GIST) descriptor cluster to k classification earlier.Then, for each classification, a class saliency map is to obtain by the average of calculating all saliency maps in this classification.By the saliency map of calculating frame and the distance between the class saliency map, the present invention selects a certain proportion of image and gathers as candidate's key frame.This ratio is pre-determined.Constitute a frame combination from any k frame in the set of candidate's key frame, if they are from different classifications.The present invention is finally according to geometrical constraint and all frame combination and the combinations of final decision key frame of vision constraint ordering.
2.1 generality global characteristics (GIST) cluster
The purpose of cluster is to represent video content by more definite representational visual angles.If it is to take from similar visual angle that a lot of images are arranged, exist similar image to show so among them certainly.And these similar images can be described with the global characteristics of low-dimensional.The present invention comes cluster global characteristics GIST with the K Mean Method.The GIST feature has been proved to be effectively dendrogram picture.
2.2 the key frame Candidate Set generates
The present invention obtains the class saliency map of this classification by the average of calculating all saliency maps in each cluster classification.Calculate the Euclidean distance of frame saliency map and class saliency map, the present invention's frame in each classification that sorts.From each cluster classification, the frame (nearest from the distance-like saliency map) that the present invention has chosen pre-determined ratio has constituted the Candidate Set of key frame, selects 1 frame in each classification at least.Final key frame comes from this Candidate Set.Calculating sampling rate of the present invention is as follows:
η=1/(n/k) (9)
Here, η is a sampling rate, and n is the sum of frame in the video, and k is the classification number.
For each classification, the number of selected frame is calculated as follows:
Here, S iBe the quantity of the frame selected of i classification, n iIt is the sum of frame in i the classification.
2.3 key-frame extraction
In order to select the needed key frame group of three-dimensional reconstruction, the present invention relies on how much and vision retrains the key frame combination of sorting all.
Geometrical constraint is in order to guarantee that the frame of video in the selected key frame group comprises the zone of coincidence in three dimensions.The present invention extracts the SIFT feature, and estimates a fundamental matrix between image with random sampling consistance (RANSAC) algorithm.For a specific frame group, between each frame and other frames some match points are arranged.The summation of the match point in each frame group is the frame group hereto, is a new representation feature, is called the geometrical constraint score.The present invention according to all frame group of geometrical constraint score descending sort and.
Different frame in the same frame group is from different visual angles.The vision constraint specification can see the content of which real world from a visual angle.In a frame group, the present invention can recover the vision order of each frame.For a given frame group, it is as follows that the present invention defines vision loss (VL):
VL = &Sigma; i = 2 k - 1 | ( O i - 1 + O i + 1 ) / 2 - O i | - - - ( 11 )
Here, k is a cluster numbers, O iThe visual angle rank of representing the i frame.VL is a vision constraint score.The present invention arranges all frame combinations according to vision constraint score VL ascending order.
For the combination of each frame, computational geometry constraint score of the present invention and vision constraint score and, have minimum and the frame group be the key frame group.If several frame combinations have same minimum value, they can be chosen as the key frame combination so.
The three-dimensional reconstruction that 3 attention rates strengthen
The three-dimensional rebuilding method that the present invention proposes a kind of attention rate enhancing improves reconstructed results.Method of the present invention is a kind of method based on non-demarcation.With previous three-dimensional rebuilding method, enhancing three-dimensional rebuilding method of the present invention has not only been given prominence to area-of-interest, and has saved calculated amount.
At first, the present invention's camera parameters that recovers key frame by the method for exercise recovery structure (structure-from-motion) automatically.Then, in each key frame, detect son and detect angle point with difference of Gaussian (DOG) and Harris (Harris).For each key frame, area-of-interest is made up of the zone of high visual saliency.By the frame saliency map, the present invention has deleted that those detect is distributed in feature outside the area-of-interest.At last, Yu Xia feature is provided and recovers three-dimensional information.Through simple coupling, the process that expands and filter: 1) initial characteristics coupling: constrain between the different frame by polar curve, the picture feature that coupling is remaining, thus in marking area, form sparse piece and distribute.Given these initial couplings, ensuing two steps repeat n time; 2) piece expands: initial match block is spread towards periphery, and obtain dense piece distribution; 3) piece filters: retrain the piece of eliminating matching error according to vision.
4 implementation results
In order to assess the present invention, the present invention has designed two groups of experiments, is respectively under the indoor and outdoors environment.In two groups of experiments, the present invention provides the vision attention analysis respectively, the test findings of key frame of video extraction and three-dimensional reconstruction.
4.1 vision attention analytical test
Fig. 2 describes is vision attention analysis result under indoor environment.Two two field pictures among Fig. 2 in (a) are attention-degree analysis results of the key frame that extracts from different perspectives.No matter can significantly find out, be static saliency map or dynamic saliency map, all can't describe out the position and the profile of object accurately, and the saliency map after merging can provide description preferably.
That Fig. 3 describes is the result that vision attention is analyzed under outdoor environment.Two two field pictures among Fig. 2 in (a) are attention-degree analysis results of the key frame that extracts from different perspectives.Can significantly find out, no matter be static saliency map or dynamic saliency map, the position and the profile of object all can't be described out accurately, particularly in the description of dynamic saliency map, too much nontarget area is described as being region-of-interest, and the saliency map after merging can provide description preferably.
Can prove that by Fig. 2 and Fig. 3 the result of video attention-degree analysis of the present invention is effective.
Vision attention analysis result under Fig. 2 indoor environment.(a) among Fig. 2 is original image, and (b) among Fig. 2 is static saliency map, and (c) among Fig. 2 is the position saliency map, and (d) among Fig. 2 is dynamic saliency map, and (e) among Fig. 2 merges saliency map afterwards.
Vision attention analysis result under Fig. 3 outdoor environment.(a) among Fig. 3 is original image, and (b) among Fig. 3 is static saliency map, and (c) among Fig. 3 is the position saliency map, and (d) among Fig. 3 is dynamic saliency map, and (e) among Fig. 3 merges saliency map afterwards.
4.2 key-frame extraction experiment
The result of key-frame extraction is presented among Fig. 4 and Fig. 5.Wherein, Fig. 4 is that key frame of video extracts the result under the indoor environment, and Fig. 5 is that key frame of video extracts the result under the outdoor environment.With the yellow frame description is the saliency map of the relatively poor frame of vision attention result of calculation.As can be seen, in the key that the present invention selects, only have the saliency map result of frame of minority relatively poor, other region-of-interest can both be described preferably.
Key frame of video extracts the result under Fig. 4 indoor environment.(a) among Fig. 4 is the key frame that extracts with the method among the present invention, and (b) among Fig. 4 is (a) the corresponding saliency map among Fig. 4.What yellow frame was described is the saliency map of the relatively poor frame of vision attention result of calculation.
Key frame of video extracts the result under Fig. 5 indoor environment.(a) among Fig. 5 and (b) be the key frame that extracts with the method among the present invention, (c) among Fig. 5 and (d) be (a) and (b) the vision attention figure of correspondence among Fig. 5.That the thick frame of (c) is described among Fig. 5 is the attention rate figure of the relatively poor frame of vision attention result of calculation.
4.3 the assessment of three-dimensional reconstruction
The assessment of three-dimensional reconstruction mainly comprises two aspects, and one is time-related assessment, and one is the assessment of rebuilding effect.What describe in the table 1 is the time loss of rebuilding, and is the example of the reconstruction of indoor scene among Fig. 6, and Fig. 7 is the example of the reconstruction of outdoor scene.
Method among the present invention has very big advantage in time as can be seen from Table 1, can save a large amount of time in reconstruction.The position of from Fig. 6 and Fig. 7, drawing yellow frame as can be seen, the key frame that the present invention extracts can better carry out three-dimensional reconstruction than the key frame that the method for stochastic sampling obtains.Simultaneously, (e) among Fig. 6 and Fig. 7 can reflect that again method of the present invention can access and carries out the similar result of three-dimensional reconstruction with panorama sketch.
The time loss of table 1. three-dimensional reconstruction
Time loss Original three-dimensional reconstruction Method of the present invention
Indoor scene 4.3 hour 3.5 hour
Outdoor scene 8.7 hour 3.5 hour
The example that Fig. 6 indoor scene is rebuild.(a) among Fig. 6 is original image, (b) among Fig. 6 is the saliency map of (a) among Fig. 6, (c) among Fig. 6 is the reconstructed results of the frame of video selected with the mode of stochastic sampling, (d) among Fig. 6 is the reconstructed results of the key frame of video selected with method of the present invention, and (e) among Fig. 6 is the result that the saliency map of the key frame selected with the present invention and each frame is rebuild.
The example that Fig. 7 outdoor scene is rebuild.(a) among Fig. 7 is original image, (b) among Fig. 7 is the saliency map of (a) among Fig. 7, (c) among Fig. 7 is the reconstructed results of the frame of video selected with the mode of stochastic sampling, (d) among Fig. 7 is the reconstructed results of the key frame of video selected with method of the present invention, and (e) among Fig. 7 is the result that the saliency map of the key frame selected with the present invention and each frame is rebuild.
The above; only be the embodiment among the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected, all should be encompassed within the protection domain of claims of the present invention.

Claims (4)

1. the object three-dimensional rebuilding method based on attention rate is characterized in that, improves the effect of three-dimensional reconstruction and accelerates the speed of three-dimensional reconstruction by the area-of-interest in the analysis video frame, comprises that step is as follows:
Step S1: the Video Segmentation that will be used for three-dimensional reconstruction is a frame of video, and comes the vision attention the analysis video frame to distribute from static state, position and dynamic three aspects, and obtains static state, position and the dynamic saliency map of its correspondence; The saliency map that merges static state, position and dynamic tripartite surface analysis is to obtain the saliency map based on video of each frame of video, and the described marking area of saliency map is the area-of-interest in the three-dimensional reconstruction;
Step S2: utilize the summary global characteristics to come all frame of video of cluster, and select a candidate's key frame set according to the saliency map that each frame of video produces, finally, extract the key frame of video that is used for three-dimensional reconstruction by geometrical constraint and the about beam analysis of vision;
Step S3: use key frame of video and its corresponding saliency map, only the marking area in the frame of video is carried out three-dimensional reconstruction, to obtain the accurate three-dimensional model on area-of-interest and to accelerate reconstruction speed.
2. the object three-dimensional rebuilding method based on attention rate according to claim 1 is characterized in that the vision attention of described analysis video comprises: static attention-degree analysis, position attention-degree analysis, dynamically attention-degree analysis and attention rate fusion;
For each frame of video, use based on contrast with based on the method that information theory combines and carry out static attention-degree analysis, obtain static saliency map;
For each frame of video,, obtain the position saliency map from level, vertical and radiate the motion of describing video camera in three aspects and use mould preparation plate coupling and carry out the position attention-degree analysis;
For adjacent video frames, carry out dynamic attention-degree analysis from video spectators and two aspects of video capture person, obtain the dynamic saliency map of preceding frame in adjacent two frames;
Static saliency map, position saliency map and dynamic saliency map for each frame of video that obtains, use the mode of dynamic fusion to carry out the attention rate fusion, according to static saliency map and dynamically the relation between the average of saliency map calculate when merging separately weight, and finally obtain the visual saliency figure after the fusion of each frame of video.
3. the object three-dimensional rebuilding method based on attention rate according to claim 1 is characterized in that, the step that described extraction is used for the key frame of video of three-dimensional reconstruction comprises as follows:
Step S21: use all frames of summary global characteristics descriptor cluster earlier to k cluster classification;
Step S22:, obtain the class saliency map of this classification by the average of calculating all saliency maps in this classification for each cluster classification;
Step S23: calculate the saliency map of frame in each cluster and the distance between the class saliency map, and from each cluster classification, select with the class saliency map and gather as candidate's key frame apart from 10% image of minimum;
Step S24: will constitute a frame combination from any k frame in the set of candidate's key frame, if they are from different classifications, according to geometrical constraint and all frame combination and the combinations of final decision key frame of vision constraint ordering.
4. the object three-dimensional rebuilding method based on attention rate according to claim 1 is characterized in that, described only the marking area in the frame of video to be carried out the step of three-dimensional reconstruction as follows:
Step S31: use the camera parameters that recovers key frame by the method for exercise recovery structure automatically; Then, in each key frame, detect son and detect angle point with difference of Gaussian and Harris; Area-of-interest to each key frame is described by the value of visual saliency; By the frame saliency map, deletion detects is distributed in feature outside the area-of-interest; At last, being distributed in feature in the area-of-interest is provided and recovers three-dimensional information;
Step S32: use the limit constraint between two width of cloth images that the picture feature that is distributed in the area-of-interest is carried out characteristic matching, distribute, obtain the initial matching piece thereby in marking area, form sparse piece;
Step S33: repeat n time the initial matching piece spread expansion towards periphery, and obtain dense piece and distribute;
Step S34:,, realize the three-dimensional reconstruction that attention rate strengthens to the piece of dense n elimination of piece distribution repetition matching error according to the vision constraint.
CN 201010574274 2010-11-30 2010-11-30 Three-dimensional reconstruction method of target based on attention Pending CN102034267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010574274 CN102034267A (en) 2010-11-30 2010-11-30 Three-dimensional reconstruction method of target based on attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010574274 CN102034267A (en) 2010-11-30 2010-11-30 Three-dimensional reconstruction method of target based on attention

Publications (1)

Publication Number Publication Date
CN102034267A true CN102034267A (en) 2011-04-27

Family

ID=43887119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010574274 Pending CN102034267A (en) 2010-11-30 2010-11-30 Three-dimensional reconstruction method of target based on attention

Country Status (1)

Country Link
CN (1) CN102034267A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102496024A (en) * 2011-11-25 2012-06-13 山东大学 Method for detecting incident triggered by characteristic frame in intelligent monitor
CN102768767A (en) * 2012-08-06 2012-11-07 中国科学院自动化研究所 Online three-dimensional reconstructing and locating method for rigid body
CN104951495A (en) * 2014-03-28 2015-09-30 韩国电子通信研究院 Apparatus and method for managing representative video images
CN106372636A (en) * 2016-08-25 2017-02-01 上海交通大学 HOG-TOP-based video significance detection method
CN106875437A (en) * 2016-12-27 2017-06-20 北京航空航天大学 A kind of extraction method of key frame towards RGBD three-dimensional reconstructions
CN104021544B (en) * 2014-05-07 2018-11-23 中国农业大学 A kind of greenhouse vegetable disease monitor video extraction method of key frame, that is, extraction system
CN109508642A (en) * 2018-10-17 2019-03-22 杭州电子科技大学 Ship monitor video key frame extracting method based on two-way GRU and attention mechanism
CN110322453A (en) * 2019-07-05 2019-10-11 西安电子科技大学 3D point cloud semantic segmentation method based on position attention and auxiliary network
CN111105460A (en) * 2019-12-26 2020-05-05 电子科技大学 RGB-D camera pose estimation method for indoor scene three-dimensional reconstruction
CN112805723A (en) * 2020-03-06 2021-05-14 华为技术有限公司 Image processing system and method and automatic driving vehicle comprising system
CN113450459A (en) * 2020-03-25 2021-09-28 北京四维图新科技股份有限公司 Method and device for constructing three-dimensional model of target object
CN114598809A (en) * 2022-01-18 2022-06-07 影石创新科技股份有限公司 Method for selecting view angle of panoramic video, electronic device, computer program product and readable storage medium
CN116295097A (en) * 2023-02-15 2023-06-23 天津大学 Three-dimensional data set acquisition and evaluation method and device with material universality

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350101A (en) * 2008-09-09 2009-01-21 北京航空航天大学 Method for auto-registration of multi-amplitude deepness image
CN101651772A (en) * 2009-09-11 2010-02-17 宁波大学 Method for extracting video interested region based on visual attention
CN101777059A (en) * 2009-12-16 2010-07-14 中国科学院自动化研究所 Method for extracting landmark scene abstract
CN101877143A (en) * 2009-12-09 2010-11-03 中国科学院自动化研究所 Three-dimensional scene reconstruction method of two-dimensional image group

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350101A (en) * 2008-09-09 2009-01-21 北京航空航天大学 Method for auto-registration of multi-amplitude deepness image
CN101651772A (en) * 2009-09-11 2010-02-17 宁波大学 Method for extracting video interested region based on visual attention
CN101877143A (en) * 2009-12-09 2010-11-03 中国科学院自动化研究所 Three-dimensional scene reconstruction method of two-dimensional image group
CN101777059A (en) * 2009-12-16 2010-07-14 中国科学院自动化研究所 Method for extracting landmark scene abstract

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Multimedia and Expo》 20100723 Xian Xiao;Changsheng Xu;Yong Rui VIDEO BASED 3D RECONSTRUCTION USING SPATIO-TEMPORAL ATTENTION ANALYSIS 1091-1096 1-4 , *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102496024A (en) * 2011-11-25 2012-06-13 山东大学 Method for detecting incident triggered by characteristic frame in intelligent monitor
CN102496024B (en) * 2011-11-25 2014-03-12 山东大学 Method for detecting incident triggered by characteristic frame in intelligent monitor
CN102768767A (en) * 2012-08-06 2012-11-07 中国科学院自动化研究所 Online three-dimensional reconstructing and locating method for rigid body
CN104951495A (en) * 2014-03-28 2015-09-30 韩国电子通信研究院 Apparatus and method for managing representative video images
CN104951495B (en) * 2014-03-28 2019-02-05 韩国电子通信研究院 Device and method for Management Representative video image
CN104021544B (en) * 2014-05-07 2018-11-23 中国农业大学 A kind of greenhouse vegetable disease monitor video extraction method of key frame, that is, extraction system
CN106372636A (en) * 2016-08-25 2017-02-01 上海交通大学 HOG-TOP-based video significance detection method
CN106875437A (en) * 2016-12-27 2017-06-20 北京航空航天大学 A kind of extraction method of key frame towards RGBD three-dimensional reconstructions
CN109508642A (en) * 2018-10-17 2019-03-22 杭州电子科技大学 Ship monitor video key frame extracting method based on two-way GRU and attention mechanism
CN109508642B (en) * 2018-10-17 2021-08-17 杭州电子科技大学 Ship monitoring video key frame extraction method based on bidirectional GRU and attention mechanism
CN110322453A (en) * 2019-07-05 2019-10-11 西安电子科技大学 3D point cloud semantic segmentation method based on position attention and auxiliary network
CN111105460A (en) * 2019-12-26 2020-05-05 电子科技大学 RGB-D camera pose estimation method for indoor scene three-dimensional reconstruction
CN112805723A (en) * 2020-03-06 2021-05-14 华为技术有限公司 Image processing system and method and automatic driving vehicle comprising system
CN113450459A (en) * 2020-03-25 2021-09-28 北京四维图新科技股份有限公司 Method and device for constructing three-dimensional model of target object
CN113450459B (en) * 2020-03-25 2024-03-22 北京四维图新科技股份有限公司 Method and device for constructing three-dimensional model of target object
CN114598809A (en) * 2022-01-18 2022-06-07 影石创新科技股份有限公司 Method for selecting view angle of panoramic video, electronic device, computer program product and readable storage medium
CN114598809B (en) * 2022-01-18 2024-06-18 影石创新科技股份有限公司 Panoramic video view angle selection method, electronic equipment and readable storage medium
CN116295097A (en) * 2023-02-15 2023-06-23 天津大学 Three-dimensional data set acquisition and evaluation method and device with material universality
CN116295097B (en) * 2023-02-15 2024-01-09 天津大学 Three-dimensional data set acquisition and evaluation method and device with material universality

Similar Documents

Publication Publication Date Title
CN102034267A (en) Three-dimensional reconstruction method of target based on attention
Sheng et al. UrbanLF: A comprehensive light field dataset for semantic segmentation of urban scenes
CN102867188B (en) Method for detecting seat state in meeting place based on cascade structure
Matzen et al. Nyc3dcars: A dataset of 3d vehicles in geographic context
CN108830252A (en) A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN110188835B (en) Data-enhanced pedestrian re-identification method based on generative confrontation network model
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN109543695A (en) General density people counting method based on multiple dimensioned deep learning
CN104978567B (en) Vehicle checking method based on scene classification
CN104166841A (en) Rapid detection identification method for specified pedestrian or vehicle in video monitoring network
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
Chen et al. End-to-end learning of object motion estimation from retinal events for event-based object tracking
Nedović et al. Stages as models of scene geometry
CN104517095B (en) A kind of number of people dividing method based on depth image
CN107481279A (en) A kind of monocular video depth map computational methods
CN101877143A (en) Three-dimensional scene reconstruction method of two-dimensional image group
Liu et al. VisDrone-CC2021: the vision meets drone crowd counting challenge results
CN104835182A (en) Method for realizing dynamic object real-time tracking by using camera
CN109993269A (en) Single image people counting method based on attention mechanism
CN107767416A (en) The recognition methods of pedestrian&#39;s direction in a kind of low-resolution image
CN110503078A (en) A kind of remote face identification method and system based on deep learning
CN108280421A (en) Human bodys&#39; response method based on multiple features Depth Motion figure
CN104063871A (en) Method for segmenting image sequence scene of wearable device
CN111680560A (en) Pedestrian re-identification method based on space-time characteristics
Diaz et al. Detecting dynamic objects with multi-view background subtraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110427