CN114973060A - Similarity calculation method and system for mobile video - Google Patents
Similarity calculation method and system for mobile video Download PDFInfo
- Publication number
- CN114973060A CN114973060A CN202210430592.5A CN202210430592A CN114973060A CN 114973060 A CN114973060 A CN 114973060A CN 202210430592 A CN202210430592 A CN 202210430592A CN 114973060 A CN114973060 A CN 114973060A
- Authority
- CN
- China
- Prior art keywords
- video frame
- video
- similarity
- fov
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 83
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 91
- 238000000034 method Methods 0.000 claims abstract description 59
- 230000000007 visual effect Effects 0.000 claims abstract description 24
- 238000013499 data model Methods 0.000 claims abstract description 7
- 238000010606 normalization Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 2
- 238000004220 aggregation Methods 0.000 claims description 2
- 238000005259 measurement Methods 0.000 abstract description 3
- 238000007418 data mining Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 6
- 230000006872 improvement Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 231100001263 laboratory chemical safety summary Toxicity 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000013210 evaluation model Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007728 cost analysis Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for calculating the similarity of a mobile video, and relates to the field of data mining. The method comprises the following steps: the method comprises the steps of calculating an intersection and a union of a visual area of a first video frame and a visual area of a second video frame through a video frame data model, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, determining the maximum public view similarity according to the preset coefficient, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence respectively, calculating the video similarity distance between the first video frame and a pixel and between the second video frame based on a longest public subsequence algorithm, combining the first video frame sequence and the second video frame sequence, taking the maximum public view similarity as a weight, normalizing the video similarity distance to obtain a similarity value, and realizing the identification and calculation of the similarity of the mobile video based on the maximum public sub-view measurement.
Description
Technical Field
The invention relates to the field of data mining, in particular to a method and a system for calculating the similarity of a mobile video.
Background
In the big data era, with the rapid development of mobile sensing technology, a great deal of data is continuously generated, such as trajectory data, where a trajectory is a kind of space-time data, and refers to a moving path of an object in space, and is usually represented as a sequence of GPS points, such as pi ═ x, y, t, where the object is located at a geographic coordinate position (x, y) at time t, and x and y represent latitude and longitude, respectively; for example, the number of Geo-tagged videos (Geo-tagged videos) generated by mobile users, which are significantly increased by devices such as smart phones with GPS, accelerometers, and gyroscope sensors, is composed of a series of Video frames, including various spatial attributes including latitude and longitude of a photographed Video, a view direction of the photographed Video, a visible distance of the photographed Video, and an angle of the photographing device, and the spatial attributes enable us to recognize some new interesting track patterns, and with the push of the fields such as internet of things and smart city computing, the mobile Video data with the Geo-tags have a great value and play an important role in many social applications, such as track pattern recognition and geographic image classification, traffic flow analysis and prediction, lane planning, and the like, The business aspects such as travel hot spot detection and the like all have important performance. Compared with the distance measurement between a point or a point and a track, the distance measurement between the tracks is more complex and more factors need to be considered; the track similarity serves as a basic algorithm service, the distance between tracks can be measured, support is provided for upper-layer application, and the problem of the track similarity needs to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method and a system for calculating the similarity of a mobile video aiming at the defects of the prior art.
The technical scheme for solving the technical problems is as follows:
a similarity calculation method of a mobile video, comprising:
s1, calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through the video frame data model;
s2, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determining the maximum common view similarity according to the preset coefficient;
s3, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
s4, calculating the video similarity distance between the first video frame and the pixel and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on a longest common subsequence algorithm;
and S5, carrying out normalization processing on the video similarity distance to obtain a similarity value.
The invention has the beneficial effects that: the method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and carries out normalization processing on the video similarity distance to obtain a similarity value, thereby realizing the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view.
Further, still include: optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the S4 specifically includes:
and calculating the video similarity distance between the first video frame and the pixel and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on the optimized longest common subsequence algorithm.
The beneficial effect of adopting the further scheme is that: the method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Further, the preset method comprises the following steps: the longest common subsequence algorithm based on the minimum boundary segment is the longest common subsequence algorithm, the longest common subsequence algorithm of the minimum boundary triangle is the longest common subsequence algorithm or the longest common subsequence algorithm of the minimum boundary rectangle.
The beneficial effect of adopting the further scheme is that: the scheme approximates a visible area by a longest public subsequence algorithm of a minimum boundary segment; accelerating the calculation of the FoV region through the longest public subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
Further, the calculating the preset coefficient between the first video frame and the second video frame according to the intersection and the union specifically includes:
calculating the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual region, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Further, the S5 specifically includes: normalizing the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame.
Another technical solution of the present invention for solving the above technical problems is as follows:
a similarity calculation system for a mobile video, comprising: the device comprises an intersection union set calculation module, a video frame similarity calculation module, a video sequence similarity calculation module and a similarity calculation module;
the intersection union calculation module is used for calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through a video frame data model;
the video frame similarity calculation module is used for calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union and determining the maximum common view similarity according to the preset coefficient;
the video sequence module is used for converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence respectively;
the video sequence similarity calculation module is used for calculating the video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame based on a longest common subsequence algorithm by combining the first video frame sequence and the second video frame sequence and taking the greatest common view similarity as a weight value;
the similarity calculation module is used for carrying out normalization processing on the video similarity distance to obtain a similarity value.
The beneficial effects of the invention are: the method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and carries out normalization processing on the video similarity distance to obtain a similarity value, thereby realizing the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view.
Further, still include: the optimization module is used for optimizing the longest public subsequence algorithm by a preset method to obtain an optimized longest public subsequence algorithm;
the video sequence similarity calculation module is specifically configured to calculate, based on the optimized longest common subsequence algorithm, a video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the maximum common view similarity as a weight.
The beneficial effect of adopting the further scheme is that: the method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Further, the preset method comprises the following steps: a longest common subsequence algorithm based on a minimum border segment, a longest common subsequence algorithm of a minimum border triangle, or a longest common subsequence algorithm of a minimum border rectangle.
The beneficial effect of adopting the further scheme is that: the scheme approximates the visible area by the longest public subsequence algorithm of the minimum boundary segment; accelerating the calculation of the FoV region through the longest public subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
Further, the video frame similarity calculation module is specifically configured to calculate the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Further, the similarity calculation module is specifically configured to perform normalization processing on the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a flow chart illustrating a method for calculating similarity of a mobile video according to an embodiment of the present invention;
fig. 2 is a block diagram of a similarity calculation system for mobile video according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a FoV visual area model provided by other embodiments of the present invention;
FIG. 4 is a schematic diagram of a FoV common view area model provided by other embodiments of the present invention;
FIG. 5 is a schematic diagram of a FoV model of Geo Video according to another embodiment of the present invention;
FIG. 6 is a schematic diagram of MBS, MBT and MBR optimization algorithms according to other embodiments of the present invention;
FIG. 7 is a graphical illustration of the effect of the amount Fov provided by other embodiments of the invention;
FIG. 8 is a schematic diagram of the effect of visible distance provided by other embodiments of the present invention;
fig. 9 is a schematic diagram of a runtime comparison provided by other embodiments of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
As shown in fig. 1, a method for calculating similarity of a mobile video according to an embodiment of the present invention includes:
s1, calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through the video frame data model; wherein the visible area of the first video frame is the fov area representing the first moving video; the second viewable area is of a similar type. It should be noted that the FoV is an angle formed by two edges of the lens, which is the maximum range of the lens through which the object image of the target to be measured can pass, and if the width (W) of the imaging plane is fixed, the size of the FoV is directly determined by the Focal Length. The larger the Focal Length, the farther away it is seen, but the smaller the FoV. The smaller the Focal Length, the closer it is seen, the larger the FoV becomes. The CVW is the intersection of two FoV areas, representing a common view. Table 1 is a parameter index table for LCVS algorithm:
TABLE 1
In one embodiment, the viewable area model may include: as shown in FIG. 3, for the FoV visual area model of the mobile Video Geo Video, let i be the time stamp and p be i Is the position of the camera, r i Is the visible distance, θ, from the camera i Is the angle from north to the camera direction, δ i is the maximum horizontal angle of the camera lens, a video frame with timestamp i and a set of spatial attributes (p) i ,r i ,θ i ,δ i ) Combined composition fov i Let fovi ═ (pi, ri, θ i, δ i), as shown in fig. 3, be a model of the FoV visible region.
S2, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determining the maximum common view similarity according to the preset coefficient; it should be noted that, the process of calculating the preset coefficient may include: calculating the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
In another embodiment, head (a) is the fov sequence from a 1 to m-1 (i.e., head (a) { fov1, fov2, fov3, fov4, fov5, fov6 … … fovm-1}), head (B) is the fov sequence from B1 to n-1 (i.e., head (B) { fov1, fov2, fov3, fov4, fov5, fov6 … … fovn-1}), and σ is set to the minimum time threshold, then the maximum common-view similarity distance is defined as follows:
the maximum common-view similarity distance is calculated for the given FoV sequences of Geo Video a and Geo Video B, and is equal to the sum of the common-view CVW and the maximum common-view similarity distances of the sequences head (a) and head (B) when a defined minimum time threshold is met, and table 2 shows the maximum common-view similarity distance parameter:
TABLE 2
In another embodiment, the LCVS similarity and distance functions: given a minimum time threshold σ, the LCVS similarity between A, B is defined as follows:
A. the LCVS distance between B is defined as follows:
Distance(A,B,σ)=1-Similarity(A,B,σ),
LCVS distance is a metric and satisfies the following three properties:
(i)Distance(A,B,σ)≥0 for all A,B≠φ(non-negativity),
in one embodiment, as shown in table 3, the pseudo code of the LCVS algorithm may include: the LCVS constructs a two-dimensional array to store the comparison results between FoV labeled videos first, then calls the Generation FoV function and generates a series of FoVs by extracting parameters, including the number of edges k, the viewing distance r, the viewing angle δ, and the number of frames per second fps (lines 1-2). The GenerateFoV function is the formula: f ═ F i |p i ,r,θ i δ, 1 ≦ i ≦ m }, we parameterize the mobile video and compute the largest common view subsequence of the mobile video using the Generator FoV function. After array initialization (lines 3-10), the common view between the FoVs is computed cvw using LCVS (line 13). Is connected withThen, the LCVS algorithm computes the largest common view subsequence according to (lines 14-20), if the computed common view CVW is greater than the defined threshold e and the absolute value of the difference between the two view sequence numbers is less than the defined threshold σ, then the latter array element is equal to the previous array element plus the value of the common view CVW; if the above condition is not met, the latter array element is equal to the former array element, and the intersection between the FoVs is filtered out. Finally, the result is normalized to s and returned (line 25).
TABLE 3
S3, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
s4, calculating the video similarity distance between the first video frame and the pixel and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on a longest common subsequence algorithm;
it should be noted that the lcvs (target Common View subsequence) algorithm is extended by adding CVW based on the lcs algorithm (longest subsequence algorithm).
In another embodiment, calculating the video similarity distance between the first video frame and the pixel and the second video frame may include: the common view CVW may include: two FoV are defined: fovi and fovj, define View (fov) i ) And View (fov) j ) Fov region, then for View (fov) i ) And View (fov) j ) The definition of the common view CVW in between is as follows:
common view area model, as shown in fig. 4, table 4 gives the parameters of the common view CVW:
TABLE 4
Wherein, | View (fov) i )∩View(fov j ) I denotes the intersection of two FoV regions, | View (FoV) i )∪View(fov j ) L represents the union of two FoV areas.
In another embodiment, Geo Video a consists of m FoV (i.e., a ═ FoV1, FoV2, FoV3, FoV4, FoV5, FoV6 … … fovm }) and Geo Video B consists of n FoV (i.e., B ═ FoV1, FoV2, FoV3, FoV4, FoV5, FoV6 … … fovn } }), shown in fig. 5 for the FoV areas of Geo Video.
Preferably, in any of the above embodiments, further comprising: the LCVS algorithm cost and evaluation model may include: the LCVS enumerates all subsequences of the Geo Video A, the time complexity is O (m), for each subsequence, the LCVS needs to search n fovs in the Geo Video B, the time complexity is O (n), and then the LCVS (A, B) is calculated, so that the time complexity of the LCVS algorithm is O (n · m).
Assuming k is the number of simplified edges per FoV area (e.g. triangle k is 3 and rectangle k is 4), the common view CVW needs to compute the intersection and union of the FoV areas, and the computation cost of the LCVS algorithm can be decomposed into two parts:
(i) computational cost of CVW
(ii) Cumulative cost
The cost of the LCVS algorithm is defined as follows:
C evw =C interset +C eventpoint +C area ,
where Cinterset represents the cost of computing and ordering intersecting edges (e.g., number of edges 2k), centrepoint represents the cost of computing matching event points (e.g., 2k · log 2k + I · log 2k), and care represents the cost of computing FoV intersection regions. The cost of CVW is between Ccpu.2k.log 2k and Ccpu (2klog 2k + I.log 2k), where I is the number of intersections and Ccpu represents the time cost of calculating the number of views.
Then, the LCVS algorithm cost is specifically defined as follows:
the LCVS algorithm cost parameter index is shown in table 5:
TABLE 5
And S5, carrying out normalization processing on the video similarity distance to obtain a similarity value.
The method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and performs normalization processing on the video similarity distance to obtain a similarity value, so that the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view are realized.
Preferably, in any of the above embodiments, further comprising: optimizing the longest public subsequence algorithm by a preset method to obtain an optimized longest public subsequence algorithm;
the S4 specifically includes:
and calculating the video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on the optimized longest common subsequence algorithm.
The method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Preferably, in any of the above embodiments, the presetting method includes: a longest common subsequence algorithm based on minimum boundary segments, a longest common subsequence algorithm of minimum boundary triangles, or a longest common subsequence algorithm of minimum boundary rectangles.
In one embodiment, optimizing the longest common subsequence algorithm by a preset method may include:
through algorithm cost analysis of the LCVS algorithm, it can be known how the main performance bottleneck of the algorithm is how to calculate the FoV region, the intersection of the FoV region and the union of the FoV region. We performed the following experiments to evaluate and optimize the performance of the LCVS algorithm, taking four different approaches:
(i)LCSS
(ii) LCVS using MBS
(iii) LCVS using MBT
(iiii) LCVS Using MBR
The overall objective is to demonstrate the performance improvement of the three methods we propose (excluding the lcs algorithm) in identifying similar fovs based on a data set of mobile video and to answer the following two questions (the performance metric refers to the time taken by the calculation process and the accuracy of the output result):
(i) what impact the number of fovs will have on the algorithm?
(ii) What impact the size of the visible distance will have on the algorithm?
To ensure the authenticity and validity of experimental data, we obtained 4000 real driving videos from the BDD100K website in new york, usa. Based on these mobile video data, we have generated two different data sets:
(i) straight line direction FoV
(ii) Random direction FoV
In the FoV data set in the linear direction, the direction of a camera is aligned with the direction of a moving object and is directly obtained from a vehicle event data recorder; in a random direction FoV dataset, the direction of the camera is randomly changed and not aligned with the direction of the moving object, such FoV is obtained from a camera recording of a mobile device (e.g. a smartphone).
In order to improve the efficiency of the LCVS algorithm, the invention provides three alternative methods for reducing the cost of calculating the common view CVW.
The first method approximates the FoV area using the minimum boundary segment MBS (the minimum segment), which segments the FoV area using triangles of the same size, estimates the area of the FoV area according to the sum of the areas of the triangles, as shown in a in fig. 6, where the MBS minimum boundary segment method takes the number of triangles as linear time.
The second method uses the minimum bounding triangle mbt (the minimum triangles) to approximate the FoV area, thereby speeding up the calculation of the FoV area, as shown in b of fig. 6, the method approximates the FoV area by the minimum bounding triangle of the FoV sector, and estimates the area of the FoV area according to the area of the triangle.
The third method uses a minimum bounding rectangle MBR (the minimum rectangle) to approximate the FoV area, which is a common data representation method, as shown in c in fig. 6, and this method uses a fan-shaped minimum bounding rectangle of the FoV to approximate the FoV area, so that the computation cost of CVW can be significantly reduced in special cases (such as a car recorder camera).
The scheme approximates a visible area by a longest public subsequence algorithm of a minimum boundary segment; accelerating the calculation of the FoV region through the most common long subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
In one embodiment, evaluating the impact of the number of FoV, the impact of the visual distance, and the impact on the run-time and output accuracy of the LCVS algorithm may include:
we fixedly set the minimum time threshold σ to be 1, and the segment angle (i.e. the acute angle of the triangle) in MBS to be 5, which is as follows:
(i) assessing FoV number impact
The number of FoV evaluated in this experiment is between 1000 and 4000, and output accuracy is given as a and b in fig. 7, where a denotes a video data set whose shooting direction is the front direction, and b denotes a video data set whose shooting direction is the random direction, respectively.
According to the experimental results shown in fig. 7, our method is better than the lcs in the original state, and as the number of parallel fovs increases, the accuracy of the output result of the lcs increases and the difference between LCVS increases, which occurs because the lcs only considers the distance between a point and a point to measure the similarity between tracks; fig. 8 b shows that the LCVS of MBS performs better than the LCVS of MBT, but MBR performs worse because MBR only roughly estimates the FoV area and does not take into account the accuracy of the output.
(ii) Assessing visual distance effects
The visual distance range evaluated in this experiment is 10-60 meters, and a and b in fig. 8 respectively show the accuracy of the output result. a represents a video data set whose shooting direction is a front direction; b represents a video data set in which the shooting direction is a random direction;
according to the experimental results, the LCVS using the MBS and the LCVS using the MBT are superior to the LCVS using the MBR and the LCVS using the other methods. According to b in fig. 8, the LCVS performance using MBR is poor because the error range increases with the viewing distance in the data set in random direction.
(iii) Evaluating LCVS algorithm runtime
The variation range of the number of FoV evaluated in this experiment is 1000-4000, and the running time of the output result is shown in a and b in FIG. 9. A represents a video data set whose shooting direction is the straight ahead direction; b represents a video data set with a shooting direction in a random direction;
it is known from the figure that the LCSS running time is smaller than the LCSV using MBR, the LCSV using MBT and the LCSV using MBS, and the performance gap of the LCVS algorithm increases with the increase of the number of fovs, because the three methods we adopt calculate the CVW public view to measure the similarity between tracks, and the calculation cost increases with respect to the LCSS. However, the accuracy of LCSS is much lower than our three methods, so selecting LCVS to identify the similarity of mobile videos is a better choice. In addition, b in fig. 9 shows that the LCVS using MBS is slower than other methods because MBS requires linear time to calculate CVW common view, resulting in the LCVS running time using MBS being larger than other algorithms.
By testing a real automobile data recorder data set of 4000 mobile videos, an algorithm for measuring the similarity of the mobile videos based on the maximum public sub-view is realized, a cost evaluation model is used for testing the performance of the LCVS algorithm of the MBR, MBS and MBT methods, and the LCVS algorithm is guaranteed to accurately identify and calculate the similarity of the mobile videos.
Preferably, in any of the above embodiments, the calculating a preset coefficient between the first video frame and the second video frame according to the aggregation and the union specifically includes:
calculating the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Preferably, in any of the above embodiments, the S5 specifically includes: normalizing the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame.
In one embodiment, as shown in fig. 2, a similarity calculation system for a mobile video includes: an intersection union calculation module 1101, a video frame similarity calculation module 1102, a video sequence module 1103, a video sequence similarity calculation module 1104 and a similarity calculation module 1105;
the intersection union calculation module 1101 is configured to calculate an intersection and a union of a visible region of the first video frame and a visible region of the second video frame through a video frame data model;
the video frame similarity calculation module 1102 is configured to calculate a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determine a maximum common view similarity according to the preset coefficient;
the video sequence module 1103 is configured to convert the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
the video sequence similarity calculation module 1104 is configured to calculate, based on a longest common subsequence algorithm, video similarity distances between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the greatest common view similarity as a weight;
the similarity calculation module 1105 is configured to perform normalization processing on the video similarity distance to obtain a similarity value.
The method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and performs normalization processing on the video similarity distance to obtain a similarity value, so that the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view are realized.
Preferably, in any of the above embodiments, further comprising: the optimization module is used for optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the video sequence similarity calculation module is specifically configured to calculate, based on the optimized longest common subsequence algorithm, video similarity distances between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the maximum common view similarity as a weight.
The method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Preferably, in any of the above embodiments, the presetting method includes: a longest common subsequence algorithm based on minimum boundary segments, a longest common subsequence algorithm of minimum boundary triangles, or a longest common subsequence algorithm of minimum boundary rectangles.
The scheme approximates a visible area by a longest public subsequence algorithm of a minimum boundary segment; accelerating the calculation of the FoV region through the longest public subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
Preferably, in any of the above embodiments, the video frame similarity calculating module 1102 is specifically configured to calculate the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Preferably, in any embodiment above, the similarity calculation module 1105 is specifically configured to perform normalization processing on the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame. It is understood that some or all of the alternative embodiments described above may be included in some embodiments.
It should be noted that the above embodiments are product embodiments corresponding to the previous method embodiments, and for the description of each optional implementation in the product embodiments, reference may be made to corresponding descriptions in the above method embodiments, and details are not described here again.
The reader should understand that in the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art can combine and combine features of different embodiments or examples and features of different embodiments or examples described in this specification without contradiction.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described method embodiments are merely illustrative, and for example, the division of steps into only one type of logical functional division may be implemented in practice in another manner, for example, multiple steps may be combined or integrated into another step, or some features may be omitted, or not implemented.
The above method, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method for calculating similarity of a mobile video, comprising:
s1, calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through the video frame data model;
s2, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determining the maximum common view similarity according to the preset coefficient;
s3, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
s4, calculating the video similarity distance between the first video frame and the pixel and the second video frame based on the longest common subsequence algorithm, combining the first video frame sequence and the second video frame sequence, and taking the maximum common view similarity as a weight value;
and S5, carrying out normalization processing on the video similarity distance to obtain a similarity value.
2. The method for calculating the similarity of a mobile video according to claim 1, further comprising: optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the S4 specifically includes:
and calculating the video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on the optimized longest common subsequence algorithm.
3. The method for calculating the similarity of a mobile video according to claim 2, wherein the preset method comprises: a longest common subsequence algorithm based on a minimum border segment, a longest common subsequence algorithm of a minimum border triangle, or a longest common subsequence algorithm of a minimum border rectangle.
4. The method according to claim 1, wherein the calculating the preset coefficient between the first video frame and the second video frame according to the aggregation and the union specifically comprises:
calculating the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
5. The system for calculating similarity of mobile videos according to claim 1 or 4, wherein the step S5 specifically includes: normalizing the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of a visible region of the first video frame and the second video frame, fov i A visible area of an i-th video frame of said first video frame represented in a FoV model, FoV j Is a viewable area of a jth video frame of said second video frame represented by a FoV model, head (a) represents a consecutive video frame sub-sequence of said first video frame, head (b) represents a consecutive video frame sub-sequence of said second video frame.
6. A similarity calculation system for a mobile video, comprising: the device comprises an intersection union set calculation module, a video frame similarity calculation module, a video sequence similarity calculation module and a similarity calculation module;
the intersection union calculation module is used for calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through a video frame data model;
the video frame similarity calculation module is configured to calculate a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determine a maximum common view similarity according to the preset coefficient:
the video sequence module is used for converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence respectively;
the video sequence similarity calculation module is used for calculating the video similarity distance between the first video frame and the pixel and between the second video frame based on a longest common subsequence algorithm, combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight;
the similarity calculation module is used for carrying out normalization processing on the video similarity distance to obtain a similarity value.
7. The system for calculating similarity of mobile videos according to claim 6, further comprising: the optimization module is used for optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the video sequence similarity calculation module is specifically configured to calculate, based on the optimized longest common subsequence algorithm, video similarity distances between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the maximum common view similarity as a weight.
8. The system for calculating the similarity of mobile videos according to claim 7, wherein the preset method comprises: a longest common subsequence algorithm based on a minimum border segment, a longest common subsequence algorithm of a minimum border triangle, or a longest common subsequence algorithm of a minimum border rectangle.
9. The system according to claim 6, wherein the video frame similarity calculation module is specifically configured to calculate the preset coefficient according to a first formula:
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
10. The system according to claim 6 or 9, wherein the similarity calculation module is specifically configured to perform normalization processing on the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of a visible region of the first video frame and the second video frame, fov i A visible area of an i-th video frame of said first video frame represented in a FoV model, FoV j Is a viewable area of a jth video frame of said second video frame represented by a FoV model, head (a) represents a consecutive video frame sub-sequence of said first video frame, head (b) represents a consecutive video frame sub-sequence of said second video frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210430592.5A CN114973060A (en) | 2022-04-22 | 2022-04-22 | Similarity calculation method and system for mobile video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210430592.5A CN114973060A (en) | 2022-04-22 | 2022-04-22 | Similarity calculation method and system for mobile video |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114973060A true CN114973060A (en) | 2022-08-30 |
Family
ID=82979420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210430592.5A Pending CN114973060A (en) | 2022-04-22 | 2022-04-22 | Similarity calculation method and system for mobile video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114973060A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110081043A1 (en) * | 2009-10-07 | 2011-04-07 | Sabol Bruce M | Using video-based imagery for automated detection, tracking, and counting of moving objects, in particular those objects having image characteristics similar to background |
US20140347475A1 (en) * | 2013-05-23 | 2014-11-27 | Sri International | Real-time object detection, tracking and occlusion reasoning |
US20170169297A1 (en) * | 2015-12-09 | 2017-06-15 | Xerox Corporation | Computer-vision-based group identification |
KR102043366B1 (en) * | 2018-11-21 | 2019-12-05 | (주)터보소프트 | Method for measuring trajectory similarity between geo-referenced videos using largest common view |
CN112257595A (en) * | 2020-10-22 | 2021-01-22 | 广州市百果园网络科技有限公司 | Video matching method, device, equipment and storage medium |
CN112559309A (en) * | 2020-12-18 | 2021-03-26 | 无线生活(北京)信息技术有限公司 | Method and device for adjusting page performance acquisition algorithm |
CN112904331A (en) * | 2019-11-19 | 2021-06-04 | 杭州海康威视数字技术股份有限公司 | Method, device and equipment for determining movement track and storage medium |
-
2022
- 2022-04-22 CN CN202210430592.5A patent/CN114973060A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110081043A1 (en) * | 2009-10-07 | 2011-04-07 | Sabol Bruce M | Using video-based imagery for automated detection, tracking, and counting of moving objects, in particular those objects having image characteristics similar to background |
US20140347475A1 (en) * | 2013-05-23 | 2014-11-27 | Sri International | Real-time object detection, tracking and occlusion reasoning |
US20170169297A1 (en) * | 2015-12-09 | 2017-06-15 | Xerox Corporation | Computer-vision-based group identification |
KR102043366B1 (en) * | 2018-11-21 | 2019-12-05 | (주)터보소프트 | Method for measuring trajectory similarity between geo-referenced videos using largest common view |
CN112904331A (en) * | 2019-11-19 | 2021-06-04 | 杭州海康威视数字技术股份有限公司 | Method, device and equipment for determining movement track and storage medium |
CN112257595A (en) * | 2020-10-22 | 2021-01-22 | 广州市百果园网络科技有限公司 | Video matching method, device, equipment and storage medium |
CN112559309A (en) * | 2020-12-18 | 2021-03-26 | 无线生活(北京)信息技术有限公司 | Method and device for adjusting page performance acquisition algorithm |
Non-Patent Citations (3)
Title |
---|
WEI DING 等: "Measuring similarity between geo-tagged videos using largest common view", 《ELECTRONICS LETTERS》, vol. 55, no. 8, pages 1 - 2 * |
WEI DING 等: "VVS: Fast Similarity Measuring of FoV-Tagged Videos", 《IEEE ACCESS》, pages 1 - 12 * |
黄可欣: "遮挡场景下的行人重识别算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 3 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11003956B2 (en) | System and method for training a neural network for visual localization based upon learning objects-of-interest dense match regression | |
US9349189B2 (en) | Occlusion resistant image template matching using distance transform | |
CN105628951B (en) | The method and apparatus of speed for measurement object | |
US11151382B2 (en) | Opportunity to view an object in image processing | |
CN111950394B (en) | Method and device for predicting lane change of vehicle and computer storage medium | |
Zhang et al. | Efficient auto-refocusing for light field camera | |
Kocur et al. | Detection of 3D bounding boxes of vehicles using perspective transformation for accurate speed measurement | |
Mao et al. | Uasnet: Uncertainty adaptive sampling network for deep stereo matching | |
Lu et al. | An improved graph cut algorithm in stereo matching | |
CN105608209A (en) | Video labeling method and video labeling device | |
CN112712703A (en) | Vehicle video processing method and device, computer equipment and storage medium | |
Liu et al. | Visual object tracking with partition loss schemes | |
CN112989877A (en) | Method and device for labeling object in point cloud data | |
CN114359361A (en) | Depth estimation method, depth estimation device, electronic equipment and computer-readable storage medium | |
Wu et al. | A dynamic infrared object tracking algorithm by frame differencing | |
Zekany et al. | Classifying ego-vehicle road maneuvers from dashcam video | |
Ding et al. | VVS: Fast Similarity Measuring of FoV-Tagged Videos | |
CN112215036B (en) | Cross-mirror tracking method, device, equipment and storage medium | |
CN111259702B (en) | User interest estimation method and device | |
CN114973060A (en) | Similarity calculation method and system for mobile video | |
CN116883981A (en) | License plate positioning and identifying method, system, computer equipment and storage medium | |
Christiansen et al. | Monocular vehicle distance sensor using HOG and Kalman tracking | |
CN116129154A (en) | Image object association method, computer device, and storage medium | |
CN112818743B (en) | Image recognition method and device, electronic equipment and computer storage medium | |
Kalampokas et al. | Performance benchmark of deep learning human pose estimation for UAVs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220830 |
|
RJ01 | Rejection of invention patent application after publication |