CN102034267A

CN102034267A - Three-dimensional reconstruction method of target based on attention

Info

Publication number: CN102034267A
Application number: CN 201010574274
Authority: CN
Inventors: 徐常胜; 肖宪
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2010-11-30
Filing date: 2010-11-30
Publication date: 2011-04-27

Abstract

The invention relates to a three-dimensional reconstruction method of a target based on attention, comprising the steps: step S1, dividing video for three-dimensional reconstruction into video frames, analyzing the visual attention distribution in the video frames in the static state, the position and the dynamic state and obtaining significance level diagrams; fusing the significance level diagrams in the static state, the position and the dynamic state so as to obtain the significance level diagrams based on the video of each video frame, wherein the significant areas described by the significance level diagrams are interested areas in three-dimensional reconstruction; step S2: utilizing the concept global features to cluster all the video frames, picking out a candidate keyframe set according to the significance level diagram generated by each video frame, and finally extracting the video keyframes for three-dimensional reconstruction by analyzing geometric constraint and visual constraint; and step S3: using the video keyframes and the significance level diagram corresponding to the video keyframes to carry out three-dimensional reconstruction only on the significant areas in the video frames so as to obtain accurate three-dimensional models on the interested areas and increase the reconstruction speed.

Description

Object three-dimensional rebuilding method based on attention rate

Technical field

The invention belongs to computer vision, Flame Image Process and multimedia analysis technical field relate to the object three-dimensional rebuilding method based on attention rate.

Background technology

Along with the development of digital picture, high-quality video becomes abundanter.Because the three-dimensional reconstruction based on video passes through to use the rich video data, can improve set precision and visual quality, therefore becomes a very popular research topic, in computer vision, Flame Image Process and multimedia analysis.

In general, the three-dimensional reconstruction system based on video can be divided into two classes: based on the system of non-demarcation with based on the system of demarcating certainly.System based on non-demarcation needs image and camera parameters to carry out three-dimensional reconstruction, for example block-based stereoscopic approach (PMVS) of looking more, and it retrains the three-dimensional structure of recovering object or scene by strengthening indicative of local optical consistance and overall vision.At first estimate camera parameters and then recover three-dimensional point cloud based on the system of demarcating certainly by camera self-calibration algorithm.But current approach only provides the three-dimensional reconstruction of overall scenario, and we often only pay close attention to the zone that those attract our notice.Such method has been wasted too many calculated amount and has been rebuild those non-area-of-interests, and three-dimensional model can not be given prominence to those area-of-interests.

People often pay close attention to those visions regional significantly, and the vision attention analysis can obtain the vision marking area.Computer vision has been learnt and has been widely used in the vision attention analysis, and artificial intelligence and multimedia are handled.Most of previous work concentrates on the analysis of still image, mainly uses static information.Current, the video attention-degree analysis has attracted more concern.Main method has in conjunction with static and position saliency map and obtains area-of-interest in the key frame.Except static and position attention rate, dynamically attention rate has caused people's attention equally, and is widely used in the area-of-interest-detection based on space time information.There are a lot of methods can be used for obtaining motion vector, for example optical flow method.But the estimation of the motion vector under the motion cameras condition remains a challenging problem, and is not enough from the dynamic attention rate of beholder's angle analysis only.

Summary of the invention

For the unsafty problem of the degree of accuracy that solves the prior art three-dimensional reconstruction, the method that the objective of the invention is to propose to detect in a space-time attention rate zone strengthens the three-dimensional reconstruction based on video, and a kind of object three-dimensional rebuilding method based on attention rate is provided for this reason.

For achieving the above object, the technical scheme of the object three-dimensional rebuilding method based on attention rate provided by the invention comprises: improve the effect of three-dimensional reconstruction and accelerate the speed of three-dimensional reconstruction by the area-of-interest in the analysis video frame, comprise that step is as follows:

Step S1: the Video Segmentation that will be used for three-dimensional reconstruction is a frame of video, and comes the vision attention the analysis video frame to distribute from static state, position and dynamic three aspects, and obtains static state, position and the dynamic saliency map of its correspondence; The saliency map that merges static state, position and dynamic tripartite surface analysis is to obtain the saliency map based on video of each frame of video, and the described marking area of saliency map is the area-of-interest in the three-dimensional reconstruction;

Step S2: utilize the summary global characteristics to come all frame of video of cluster, and select a candidate's key frame set according to the saliency map that each frame of video produces, finally, extract the key frame of video that is used for three-dimensional reconstruction by geometrical constraint and the about beam analysis of vision;

Step S3: use key frame of video and its corresponding saliency map, only the marking area in the frame of video is carried out three-dimensional reconstruction, to obtain the accurate three-dimensional model on area-of-interest and to accelerate reconstruction speed.

Wherein, the vision attention of described analysis video comprises: static attention-degree analysis, position attention-degree analysis, dynamically attention-degree analysis and attention rate fusion;

For each frame of video, use based on contrast with based on the method that information theory combines and carry out static attention-degree analysis, obtain static saliency map;

For each frame of video,, obtain the position saliency map from level, vertical and radiate the motion of describing video camera in three aspects and use mould preparation plate coupling and carry out the position attention-degree analysis;

For adjacent video frames, carry out dynamic attention-degree analysis from video spectators and two aspects of video capture person, obtain the dynamic saliency map of preceding frame in adjacent two frames;

Static saliency map, position saliency map and dynamic saliency map for each frame of video that obtains, use the mode of dynamic fusion to carry out the attention rate fusion, according to static saliency map and dynamically the relation between the average of saliency map calculate when merging separately weight, and finally obtain the visual saliency figure after the fusion of each frame of video.

Wherein, the described extraction step that is used for the key frame of video of three-dimensional reconstruction comprises as follows:

Step S21: use all frames of summary global characteristics descriptor cluster earlier to k cluster classification;

Step S22:, obtain the class saliency map of this classification by the average of calculating all saliency maps in this classification for each cluster classification;

Step S23: calculate the saliency map of frame in each cluster and the distance between the class saliency map, and from each cluster classification, select with the class saliency map and gather as candidate's key frame apart from 10% image of minimum;

Step S24: will constitute a frame combination from any k frame in the set of candidate's key frame, if they are from different classifications, according to geometrical constraint and all frame combination and the combinations of final decision key frame of vision constraint ordering.

It is wherein, described that only the marking area in the frame of video to be carried out the step of three-dimensional reconstruction as follows:

Step S31: use the camera parameters that recovers key frame by the method for exercise recovery structure automatically; Then, in each key frame, detect son and detect angle point with difference of Gaussian and Harris; Area-of-interest to each key frame is described by the value of visual saliency; By the frame saliency map, deletion detects is distributed in feature outside the area-of-interest; At last, being distributed in feature in the area-of-interest is provided and recovers three-dimensional information;

Step S32: use the limit constraint between two width of cloth images that the picture feature that is distributed in the area-of-interest is carried out characteristic matching, distribute, obtain the initial matching piece thereby in marking area, form sparse piece,

Step S33: repeat n time the initial matching piece spread expansion towards periphery, and obtain dense piece and distribute;

Step S34:,, realize the three-dimensional reconstruction that attention rate strengthens to the piece of dense n elimination of piece distribution repetition matching error according to the vision constraint.

Beneficial effect of the present invention: the present invention is by carrying out the vision attention analysis to each frame of video, in each frame of video, obtained comparatively accurate area-of-interest, and by to frame of video based on the cluster of global characteristics and based on the key frame of video extraction of vision constraint and geometrical constraint, obtain helping the area-of-interest in three-dimensional reconstruction key frame of video and the frame of video.Feature in the area-of-interest in the key frame of video is carried out three-dimensional reconstruction, obtained accurate three-dimensional reconstruction result, and improved the speed of three-dimensional reconstruction.

The experiment of the indoor and outdoors of the present invention in true environment has proved that method of the present invention has higher degree of accuracy and the counting yield of Geng Gao.

Description of drawings

Fig. 1 the present invention is based on the object three-dimensional reconstruction frame diagram of vision attention

Fig. 2 is the vision attention analysis result that is described under the indoor environment.

Fig. 3 is described in the result that vision attention is analyzed under the outdoor environment.

Fig. 4 is that key frame of video extracts the result under the indoor environment.

Fig. 5 is that key frame of video extracts the result under the outdoor environment.

Fig. 6 is the example that indoor scene is rebuild.

Fig. 7 is the example that outdoor scene is rebuild.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

Consider the widespread use of vision attention analysis on area-of-interest detects, the present invention proposes the method that detects in a space-time attention rate zone and strengthen three-dimensional reconstruction based on video.Method of the present invention adapts to the characteristic based on the three-dimensional reconstruction of video: the unexpected appearance of some object in unknown camera motion and the video (for example flying bird, pedestrian, the vehicle of passing by etc.).By with traditional three-dimensional rebuilding method relatively, method of the present invention can obtain three-dimensional model and lower computing consumption more accurately.Use a computer among the present invention all under Windows XP operating system, the hardware device condition is a processor: Duo double-core 2.2G, internal memory 2G.Structural drawing of the present invention shows the object three-dimensional reconstruction frame diagram based on vision attention in Fig. 1, comprise three parts: 1) based on the vision attention analysis of video, 2) key frame of video extracts 3) the video three-dimensional reconstruction that strengthens of region-of-interest, wherein:

1 vision attention analysis based on video

Vision attention analysis based on video mainly comprises four parts: static attention-degree analysis, position attention-degree analysis, dynamically attention-degree analysis and attention rate fusion.Wherein, static attention-degree analysis mainly is the analysis at the object of the static state that can cause people's visual attention location in the frame of video; The position attention-degree analysis mainly is in the frame of video, the analysis of the visual attention location that causes by the position, and also shift along with the variation of video camera the position of visual attention location; Dynamically attention-degree analysis mainly is in the adjacent video frames, and the object of motion can cause the analysis of visual attention location, and by the exercise intensity analysis to each pixel in the frame of video, acquisition can cause the dynamic saliency map of visual attention location; After visual attention location was analyzed in static state, position and dynamic three aspects, the result of tripartite surface analysis was merged in the present invention, obtains final visual attention location analysis respectively.

1.1 static attention-degree analysis

Stationary body can attract people's attention, so-called static attention rate that Here it is.Introducing center ring based on the attention-degree analysis of contrast provides human visual system (HVS) with the feature contrast around the notion of structure.Be fixed against such prerequisite based on information-theoretical method: visual attention location be fixed against maximized intelligence sample fully.Contrast and intelligence sample are two factors calculating significance.The present invention is merged and to be calculated static saliency map based on the method for contrast with based on information-theoretical method, as formula (1):

Map _static(x，y)＝Con(x，y)×ID(x，y) (1)

Here (x, the static significance value of y) locating is Map at point _Static(x, y), the contrast after the normalization be characterized as Con (x, y) and information characteristics be ID (x, y).

1.2 position attention-degree analysis

The present invention describes the motion of video camera and uses integral mold plate coupling (Integral Template Matching) technology from level (H), vertical (V) and radiation (R) three aspects.The model of application level (H), vertical (V) and radiation (R) 3 parameters, using, three formula calculate video cameras at horizontal Map _H(i, j), vertical Map _V(i is j) with radiation Map _R(i, the j) exercise intensity of three directions, as formula (2-4):

{Map}_{H} (i, j) = \max (0,1 - \frac{| j - width / 2 - k_{H} \times H |}{width / 2}) - - - (2)

{Map}_{V} (i, j) = \max (0,1 \frac{| i - height / 2 - k_{V} \times V |}{height / 2}) - - - (3)

{Map}_{R} (i, j) = \{\begin{matrix} 1 - r / r_{\max} & R &GreaterEqual; 0 \\ - k_{r} \times r / r_{\max} & R < 0 \end{matrix} - - - (4)

Here, i, j are the positions of pixel, and the r remarked pixel is put the distance of frame center, r _MaxIt is the maximal value of r.k _H, k _VAnd k _rIt is constant; H represents the tangential movement of video camera, and V represents the vertical movement of video camera, and R represents the radiation campaign of video camera, and max represents to get maximal value, and width represents the pixel wide of current video frame, and height represents the pixels tall of current video frame.

The position saliency map Map of final video camera _LocBe expressed as followsin (formula (5)):

Map _loc＝Map _H+Map _V+Map _R (5)

Map wherein _HExpression video camera exercise intensity in the horizontal direction, Map _VThe expression video camera is at the exercise intensity of vertical direction, Map _RThe intensity that rotatablely moves of expression video camera.

1.3 dynamic attention-degree analysis

Method of the present invention is from video spectators and two dynamic attention rates of aspect analysis of video capture person.From spectators' angle, the present invention has analyzed the attention which zone attracts people more.From photographer's angle, which zone the present invention studies is that photographer wants to write down.

In the method for the invention, under motion cameras, the zone that causes photographer and spectators' concern simultaneously is dynamic region-of-interest.In addition, dynamically the exercise intensity of region-of-interest neither maximum neither minimum and also visual saliency and exercise intensity be inversely proportional to.

The present invention detects exercise intensity with optical flow method, and represents exercise intensity with UV.The average of the exercise intensity of each picture frame and standard deviation are important expressions.Dynamic saliency map Map _Motion(x y) is expressed as:

{Map}_{motion} (x, y) = \{\begin{matrix} 0 & UV (x, y) > Mean + δ \times SD \\ 0 & UV (x, y) < \max (Mean - δ \times SD, UB) \\ 1 - UV (x, y) & Others \end{matrix} - - - (6)

Wherein Mean and SD represent average and standard deviation, and δ is a loss coefficient, and UB is the upper bound of optical flow method to non-texture complex region flase drop at a distance.

1.4 attention rate merges

Static saliency map has represented to attract the stationary body of spectators' interest.The position saliency map has been described the distribution of human vision susceptibility.The vision marking area that has high visual sensitivity is paid close attention to than the easier acquisition in the zone of low visual sensitivity.Therefore, by multiply by the position saliency map with static saliency map, the present invention has obtained the static saliency map that the position strengthens.Dynamically saliency map has been described in video, which easier attraction human visual system that moves.

The present invention proposes a dynamic fusion algorithm and static state, dynamically the weight of significance is by the ratio decision between the average static and dynamically saliency map.Final saliency map Map _FusionBe expressed as follows:

Map _fusion＝Map _motion×λ+Map _loc·×Map _static×(1-λ) (7)

λ＝Mean _motion/(Mean _motion+Mean _static) (8)

λ is the weight of dynamic attention rate, Map _MotionBe dynamic saliency map, Map _LocBe the position saliency map, Map _StaticBe static saliency map, Mean _StaticAnd Mean _MotionIt is the average of static and dynamic saliency map.

2 key frame of video extract

For the selecting video frame carries out three-dimensional reconstruction, the present invention proposes a new key frame of video extraction algorithm, is divided into three parts.The present invention uses all frames of summary global characteristics (GIST) descriptor cluster to k classification earlier.Then, for each classification, a class saliency map is to obtain by the average of calculating all saliency maps in this classification.By the saliency map of calculating frame and the distance between the class saliency map, the present invention selects a certain proportion of image and gathers as candidate's key frame.This ratio is pre-determined.Constitute a frame combination from any k frame in the set of candidate's key frame, if they are from different classifications.The present invention is finally according to geometrical constraint and all frame combination and the combinations of final decision key frame of vision constraint ordering.

2.1 generality global characteristics (GIST) cluster

The purpose of cluster is to represent video content by more definite representational visual angles.If it is to take from similar visual angle that a lot of images are arranged, exist similar image to show so among them certainly.And these similar images can be described with the global characteristics of low-dimensional.The present invention comes cluster global characteristics GIST with the K Mean Method.The GIST feature has been proved to be effectively dendrogram picture.

2.2 the key frame Candidate Set generates

The present invention obtains the class saliency map of this classification by the average of calculating all saliency maps in each cluster classification.Calculate the Euclidean distance of frame saliency map and class saliency map, the present invention's frame in each classification that sorts.From each cluster classification, the frame (nearest from the distance-like saliency map) that the present invention has chosen pre-determined ratio has constituted the Candidate Set of key frame, selects 1 frame in each classification at least.Final key frame comes from this Candidate Set.Calculating sampling rate of the present invention is as follows:

η＝1/(n/k) (9)

Here, η is a sampling rate, and n is the sum of frame in the video, and k is the classification number.

For each classification, the number of selected frame is calculated as follows:

Here, S _iBe the quantity of the frame selected of i classification, n _iIt is the sum of frame in i the classification.

2.3 key-frame extraction

In order to select the needed key frame group of three-dimensional reconstruction, the present invention relies on how much and vision retrains the key frame combination of sorting all.

Geometrical constraint is in order to guarantee that the frame of video in the selected key frame group comprises the zone of coincidence in three dimensions.The present invention extracts the SIFT feature, and estimates a fundamental matrix between image with random sampling consistance (RANSAC) algorithm.For a specific frame group, between each frame and other frames some match points are arranged.The summation of the match point in each frame group is the frame group hereto, is a new representation feature, is called the geometrical constraint score.The present invention according to all frame group of geometrical constraint score descending sort and.

Different frame in the same frame group is from different visual angles.The vision constraint specification can see the content of which real world from a visual angle.In a frame group, the present invention can recover the vision order of each frame.For a given frame group, it is as follows that the present invention defines vision loss (VL):

VL = Σ_{i = 2}^{k - 1} | (O_{i - 1} + O_{i + 1}) / 2 - O_{i} | - - - (11)

Here, k is a cluster numbers, O _iThe visual angle rank of representing the i frame.VL is a vision constraint score.The present invention arranges all frame combinations according to vision constraint score VL ascending order.

For the combination of each frame, computational geometry constraint score of the present invention and vision constraint score and, have minimum and the frame group be the key frame group.If several frame combinations have same minimum value, they can be chosen as the key frame combination so.

The three-dimensional reconstruction that 3 attention rates strengthen

The three-dimensional rebuilding method that the present invention proposes a kind of attention rate enhancing improves reconstructed results.Method of the present invention is a kind of method based on non-demarcation.With previous three-dimensional rebuilding method, enhancing three-dimensional rebuilding method of the present invention has not only been given prominence to area-of-interest, and has saved calculated amount.

At first, the present invention's camera parameters that recovers key frame by the method for exercise recovery structure (structure-from-motion) automatically.Then, in each key frame, detect son and detect angle point with difference of Gaussian (DOG) and Harris (Harris).For each key frame, area-of-interest is made up of the zone of high visual saliency.By the frame saliency map, the present invention has deleted that those detect is distributed in feature outside the area-of-interest.At last, Yu Xia feature is provided and recovers three-dimensional information.Through simple coupling, the process that expands and filter: 1) initial characteristics coupling: constrain between the different frame by polar curve, the picture feature that coupling is remaining, thus in marking area, form sparse piece and distribute.Given these initial couplings, ensuing two steps repeat n time; 2) piece expands: initial match block is spread towards periphery, and obtain dense piece distribution; 3) piece filters: retrain the piece of eliminating matching error according to vision.

4 implementation results

In order to assess the present invention, the present invention has designed two groups of experiments, is respectively under the indoor and outdoors environment.In two groups of experiments, the present invention provides the vision attention analysis respectively, the test findings of key frame of video extraction and three-dimensional reconstruction.

4.1 vision attention analytical test

Fig. 2 describes is vision attention analysis result under indoor environment.Two two field pictures among Fig. 2 in (a) are attention-degree analysis results of the key frame that extracts from different perspectives.No matter can significantly find out, be static saliency map or dynamic saliency map, all can't describe out the position and the profile of object accurately, and the saliency map after merging can provide description preferably.

That Fig. 3 describes is the result that vision attention is analyzed under outdoor environment.Two two field pictures among Fig. 2 in (a) are attention-degree analysis results of the key frame that extracts from different perspectives.Can significantly find out, no matter be static saliency map or dynamic saliency map, the position and the profile of object all can't be described out accurately, particularly in the description of dynamic saliency map, too much nontarget area is described as being region-of-interest, and the saliency map after merging can provide description preferably.

Can prove that by Fig. 2 and Fig. 3 the result of video attention-degree analysis of the present invention is effective.

Vision attention analysis result under Fig. 2 indoor environment.(a) among Fig. 2 is original image, and (b) among Fig. 2 is static saliency map, and (c) among Fig. 2 is the position saliency map, and (d) among Fig. 2 is dynamic saliency map, and (e) among Fig. 2 merges saliency map afterwards.

Vision attention analysis result under Fig. 3 outdoor environment.(a) among Fig. 3 is original image, and (b) among Fig. 3 is static saliency map, and (c) among Fig. 3 is the position saliency map, and (d) among Fig. 3 is dynamic saliency map, and (e) among Fig. 3 merges saliency map afterwards.

4.2 key-frame extraction experiment

The result of key-frame extraction is presented among Fig. 4 and Fig. 5.Wherein, Fig. 4 is that key frame of video extracts the result under the indoor environment, and Fig. 5 is that key frame of video extracts the result under the outdoor environment.With the yellow frame description is the saliency map of the relatively poor frame of vision attention result of calculation.As can be seen, in the key that the present invention selects, only have the saliency map result of frame of minority relatively poor, other region-of-interest can both be described preferably.

Key frame of video extracts the result under Fig. 4 indoor environment.(a) among Fig. 4 is the key frame that extracts with the method among the present invention, and (b) among Fig. 4 is (a) the corresponding saliency map among Fig. 4.What yellow frame was described is the saliency map of the relatively poor frame of vision attention result of calculation.

Key frame of video extracts the result under Fig. 5 indoor environment.(a) among Fig. 5 and (b) be the key frame that extracts with the method among the present invention, (c) among Fig. 5 and (d) be (a) and (b) the vision attention figure of correspondence among Fig. 5.That the thick frame of (c) is described among Fig. 5 is the attention rate figure of the relatively poor frame of vision attention result of calculation.

4.3 the assessment of three-dimensional reconstruction

The assessment of three-dimensional reconstruction mainly comprises two aspects, and one is time-related assessment, and one is the assessment of rebuilding effect.What describe in the table 1 is the time loss of rebuilding, and is the example of the reconstruction of indoor scene among Fig. 6, and Fig. 7 is the example of the reconstruction of outdoor scene.

Method among the present invention has very big advantage in time as can be seen from Table 1, can save a large amount of time in reconstruction.The position of from Fig. 6 and Fig. 7, drawing yellow frame as can be seen, the key frame that the present invention extracts can better carry out three-dimensional reconstruction than the key frame that the method for stochastic sampling obtains.Simultaneously, (e) among Fig. 6 and Fig. 7 can reflect that again method of the present invention can access and carries out the similar result of three-dimensional reconstruction with panorama sketch.

The time loss of table 1. three-dimensional reconstruction

Time loss	Original three-dimensional reconstruction	Method of the present invention
			Indoor scene	4.3 hour	3.5 hour
Outdoor scene	8.7 hour	3.5 hour

The example that Fig. 6 indoor scene is rebuild.(a) among Fig. 6 is original image, (b) among Fig. 6 is the saliency map of (a) among Fig. 6, (c) among Fig. 6 is the reconstructed results of the frame of video selected with the mode of stochastic sampling, (d) among Fig. 6 is the reconstructed results of the key frame of video selected with method of the present invention, and (e) among Fig. 6 is the result that the saliency map of the key frame selected with the present invention and each frame is rebuild.

The example that Fig. 7 outdoor scene is rebuild.(a) among Fig. 7 is original image, (b) among Fig. 7 is the saliency map of (a) among Fig. 7, (c) among Fig. 7 is the reconstructed results of the frame of video selected with the mode of stochastic sampling, (d) among Fig. 7 is the reconstructed results of the key frame of video selected with method of the present invention, and (e) among Fig. 7 is the result that the saliency map of the key frame selected with the present invention and each frame is rebuild.

The above; only be the embodiment among the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected, all should be encompassed within the protection domain of claims of the present invention.

Claims

1. the object three-dimensional rebuilding method based on attention rate is characterized in that, improves the effect of three-dimensional reconstruction and accelerates the speed of three-dimensional reconstruction by the area-of-interest in the analysis video frame, comprises that step is as follows:

2. the object three-dimensional rebuilding method based on attention rate according to claim 1 is characterized in that the vision attention of described analysis video comprises: static attention-degree analysis, position attention-degree analysis, dynamically attention-degree analysis and attention rate fusion;

3. the object three-dimensional rebuilding method based on attention rate according to claim 1 is characterized in that, the step that described extraction is used for the key frame of video of three-dimensional reconstruction comprises as follows:

4. the object three-dimensional rebuilding method based on attention rate according to claim 1 is characterized in that, described only the marking area in the frame of video to be carried out the step of three-dimensional reconstruction as follows:

Step S32: use the limit constraint between two width of cloth images that the picture feature that is distributed in the area-of-interest is carried out characteristic matching, distribute, obtain the initial matching piece thereby in marking area, form sparse piece;