CN114973060A - Similarity calculation method and system for mobile video - Google Patents

Similarity calculation method and system for mobile video Download PDF

Info

Publication number
CN114973060A
CN114973060A CN202210430592.5A CN202210430592A CN114973060A CN 114973060 A CN114973060 A CN 114973060A CN 202210430592 A CN202210430592 A CN 202210430592A CN 114973060 A CN114973060 A CN 114973060A
Authority
CN
China
Prior art keywords
video frame
video
similarity
fov
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210430592.5A
Other languages
Chinese (zh)
Inventor
丁伟
张玮
周岩
史慧玲
刘礼彬
郝昊
杜忠鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202210430592.5A priority Critical patent/CN114973060A/en
Publication of CN114973060A publication Critical patent/CN114973060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for calculating the similarity of a mobile video, and relates to the field of data mining. The method comprises the following steps: the method comprises the steps of calculating an intersection and a union of a visual area of a first video frame and a visual area of a second video frame through a video frame data model, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, determining the maximum public view similarity according to the preset coefficient, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence respectively, calculating the video similarity distance between the first video frame and a pixel and between the second video frame based on a longest public subsequence algorithm, combining the first video frame sequence and the second video frame sequence, taking the maximum public view similarity as a weight, normalizing the video similarity distance to obtain a similarity value, and realizing the identification and calculation of the similarity of the mobile video based on the maximum public sub-view measurement.

Description

Similarity calculation method and system for mobile video
Technical Field
The invention relates to the field of data mining, in particular to a method and a system for calculating the similarity of a mobile video.
Background
In the big data era, with the rapid development of mobile sensing technology, a great deal of data is continuously generated, such as trajectory data, where a trajectory is a kind of space-time data, and refers to a moving path of an object in space, and is usually represented as a sequence of GPS points, such as pi ═ x, y, t, where the object is located at a geographic coordinate position (x, y) at time t, and x and y represent latitude and longitude, respectively; for example, the number of Geo-tagged videos (Geo-tagged videos) generated by mobile users, which are significantly increased by devices such as smart phones with GPS, accelerometers, and gyroscope sensors, is composed of a series of Video frames, including various spatial attributes including latitude and longitude of a photographed Video, a view direction of the photographed Video, a visible distance of the photographed Video, and an angle of the photographing device, and the spatial attributes enable us to recognize some new interesting track patterns, and with the push of the fields such as internet of things and smart city computing, the mobile Video data with the Geo-tags have a great value and play an important role in many social applications, such as track pattern recognition and geographic image classification, traffic flow analysis and prediction, lane planning, and the like, The business aspects such as travel hot spot detection and the like all have important performance. Compared with the distance measurement between a point or a point and a track, the distance measurement between the tracks is more complex and more factors need to be considered; the track similarity serves as a basic algorithm service, the distance between tracks can be measured, support is provided for upper-layer application, and the problem of the track similarity needs to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method and a system for calculating the similarity of a mobile video aiming at the defects of the prior art.
The technical scheme for solving the technical problems is as follows:
a similarity calculation method of a mobile video, comprising:
s1, calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through the video frame data model;
s2, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determining the maximum common view similarity according to the preset coefficient;
s3, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
s4, calculating the video similarity distance between the first video frame and the pixel and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on a longest common subsequence algorithm;
and S5, carrying out normalization processing on the video similarity distance to obtain a similarity value.
The invention has the beneficial effects that: the method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and carries out normalization processing on the video similarity distance to obtain a similarity value, thereby realizing the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view.
Further, still include: optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the S4 specifically includes:
and calculating the video similarity distance between the first video frame and the pixel and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on the optimized longest common subsequence algorithm.
The beneficial effect of adopting the further scheme is that: the method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Further, the preset method comprises the following steps: the longest common subsequence algorithm based on the minimum boundary segment is the longest common subsequence algorithm, the longest common subsequence algorithm of the minimum boundary triangle is the longest common subsequence algorithm or the longest common subsequence algorithm of the minimum boundary rectangle.
The beneficial effect of adopting the further scheme is that: the scheme approximates a visible area by a longest public subsequence algorithm of a minimum boundary segment; accelerating the calculation of the FoV region through the longest public subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
Further, the calculating the preset coefficient between the first video frame and the second video frame according to the intersection and the union specifically includes:
calculating the preset coefficient according to a first formula:
Figure BDA0003610214050000031
wherein, View (fov) i ) A visual region, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Further, the S5 specifically includes: normalizing the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
Figure BDA0003610214050000032
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame.
Another technical solution of the present invention for solving the above technical problems is as follows:
a similarity calculation system for a mobile video, comprising: the device comprises an intersection union set calculation module, a video frame similarity calculation module, a video sequence similarity calculation module and a similarity calculation module;
the intersection union calculation module is used for calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through a video frame data model;
the video frame similarity calculation module is used for calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union and determining the maximum common view similarity according to the preset coefficient;
the video sequence module is used for converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence respectively;
the video sequence similarity calculation module is used for calculating the video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame based on a longest common subsequence algorithm by combining the first video frame sequence and the second video frame sequence and taking the greatest common view similarity as a weight value;
the similarity calculation module is used for carrying out normalization processing on the video similarity distance to obtain a similarity value.
The beneficial effects of the invention are: the method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and carries out normalization processing on the video similarity distance to obtain a similarity value, thereby realizing the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view.
Further, still include: the optimization module is used for optimizing the longest public subsequence algorithm by a preset method to obtain an optimized longest public subsequence algorithm;
the video sequence similarity calculation module is specifically configured to calculate, based on the optimized longest common subsequence algorithm, a video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the maximum common view similarity as a weight.
The beneficial effect of adopting the further scheme is that: the method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Further, the preset method comprises the following steps: a longest common subsequence algorithm based on a minimum border segment, a longest common subsequence algorithm of a minimum border triangle, or a longest common subsequence algorithm of a minimum border rectangle.
The beneficial effect of adopting the further scheme is that: the scheme approximates the visible area by the longest public subsequence algorithm of the minimum boundary segment; accelerating the calculation of the FoV region through the longest public subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
Further, the video frame similarity calculation module is specifically configured to calculate the preset coefficient according to a first formula:
Figure BDA0003610214050000051
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Further, the similarity calculation module is specifically configured to perform normalization processing on the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
Figure BDA0003610214050000061
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a flow chart illustrating a method for calculating similarity of a mobile video according to an embodiment of the present invention;
fig. 2 is a block diagram of a similarity calculation system for mobile video according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a FoV visual area model provided by other embodiments of the present invention;
FIG. 4 is a schematic diagram of a FoV common view area model provided by other embodiments of the present invention;
FIG. 5 is a schematic diagram of a FoV model of Geo Video according to another embodiment of the present invention;
FIG. 6 is a schematic diagram of MBS, MBT and MBR optimization algorithms according to other embodiments of the present invention;
FIG. 7 is a graphical illustration of the effect of the amount Fov provided by other embodiments of the invention;
FIG. 8 is a schematic diagram of the effect of visible distance provided by other embodiments of the present invention;
fig. 9 is a schematic diagram of a runtime comparison provided by other embodiments of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
As shown in fig. 1, a method for calculating similarity of a mobile video according to an embodiment of the present invention includes:
s1, calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through the video frame data model; wherein the visible area of the first video frame is the fov area representing the first moving video; the second viewable area is of a similar type. It should be noted that the FoV is an angle formed by two edges of the lens, which is the maximum range of the lens through which the object image of the target to be measured can pass, and if the width (W) of the imaging plane is fixed, the size of the FoV is directly determined by the Focal Length. The larger the Focal Length, the farther away it is seen, but the smaller the FoV. The smaller the Focal Length, the closer it is seen, the larger the FoV becomes. The CVW is the intersection of two FoV areas, representing a common view. Table 1 is a parameter index table for LCVS algorithm:
Figure BDA0003610214050000071
TABLE 1
In one embodiment, the viewable area model may include: as shown in FIG. 3, for the FoV visual area model of the mobile Video Geo Video, let i be the time stamp and p be i Is the position of the camera, r i Is the visible distance, θ, from the camera i Is the angle from north to the camera direction, δ i is the maximum horizontal angle of the camera lens, a video frame with timestamp i and a set of spatial attributes (p) i ,r i ,θ i ,δ i ) Combined composition fov i Let fovi ═ (pi, ri, θ i, δ i), as shown in fig. 3, be a model of the FoV visible region.
S2, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determining the maximum common view similarity according to the preset coefficient; it should be noted that, the process of calculating the preset coefficient may include: calculating the preset coefficient according to a first formula:
Figure BDA0003610214050000081
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
In another embodiment, head (a) is the fov sequence from a 1 to m-1 (i.e., head (a) { fov1, fov2, fov3, fov4, fov5, fov6 … … fovm-1}), head (B) is the fov sequence from B1 to n-1 (i.e., head (B) { fov1, fov2, fov3, fov4, fov5, fov6 … … fovn-1}), and σ is set to the minimum time threshold, then the maximum common-view similarity distance is defined as follows:
Figure BDA0003610214050000082
the maximum common-view similarity distance is calculated for the given FoV sequences of Geo Video a and Geo Video B, and is equal to the sum of the common-view CVW and the maximum common-view similarity distances of the sequences head (a) and head (B) when a defined minimum time threshold is met, and table 2 shows the maximum common-view similarity distance parameter:
Figure BDA0003610214050000083
TABLE 2
In another embodiment, the LCVS similarity and distance functions: given a minimum time threshold σ, the LCVS similarity between A, B is defined as follows:
Figure BDA0003610214050000084
A. the LCVS distance between B is defined as follows:
Distance(A,B,σ)=1-Similarity(A,B,σ),
LCVS distance is a metric and satisfies the following three properties:
(i)Distance(A,B,σ)≥0 for all A,B≠φ(non-negativity),
(ii)
Figure BDA0003610214050000091
(iii)
Figure BDA0003610214050000092
in one embodiment, as shown in table 3, the pseudo code of the LCVS algorithm may include: the LCVS constructs a two-dimensional array to store the comparison results between FoV labeled videos first, then calls the Generation FoV function and generates a series of FoVs by extracting parameters, including the number of edges k, the viewing distance r, the viewing angle δ, and the number of frames per second fps (lines 1-2). The GenerateFoV function is the formula: f ═ F i |p i ,r,θ i δ, 1 ≦ i ≦ m }, we parameterize the mobile video and compute the largest common view subsequence of the mobile video using the Generator FoV function. After array initialization (lines 3-10), the common view between the FoVs is computed cvw using LCVS (line 13). Is connected withThen, the LCVS algorithm computes the largest common view subsequence according to (lines 14-20), if the computed common view CVW is greater than the defined threshold e and the absolute value of the difference between the two view sequence numbers is less than the defined threshold σ, then the latter array element is equal to the previous array element plus the value of the common view CVW; if the above condition is not met, the latter array element is equal to the former array element, and the intersection between the FoVs is filtered out. Finally, the result is normalized to s and returned (line 25).
Figure BDA0003610214050000101
TABLE 3
S3, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
s4, calculating the video similarity distance between the first video frame and the pixel and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on a longest common subsequence algorithm;
it should be noted that the lcvs (target Common View subsequence) algorithm is extended by adding CVW based on the lcs algorithm (longest subsequence algorithm).
In another embodiment, calculating the video similarity distance between the first video frame and the pixel and the second video frame may include: the common view CVW may include: two FoV are defined: fovi and fovj, define View (fov) i ) And View (fov) j ) Fov region, then for View (fov) i ) And View (fov) j ) The definition of the common view CVW in between is as follows:
Figure BDA0003610214050000111
common view area model, as shown in fig. 4, table 4 gives the parameters of the common view CVW:
Figure BDA0003610214050000112
TABLE 4
Wherein, | View (fov) i )∩View(fov j ) I denotes the intersection of two FoV regions, | View (FoV) i )∪View(fov j ) L represents the union of two FoV areas.
In another embodiment, Geo Video a consists of m FoV (i.e., a ═ FoV1, FoV2, FoV3, FoV4, FoV5, FoV6 … … fovm }) and Geo Video B consists of n FoV (i.e., B ═ FoV1, FoV2, FoV3, FoV4, FoV5, FoV6 … … fovn } }), shown in fig. 5 for the FoV areas of Geo Video.
Preferably, in any of the above embodiments, further comprising: the LCVS algorithm cost and evaluation model may include: the LCVS enumerates all subsequences of the Geo Video A, the time complexity is O (m), for each subsequence, the LCVS needs to search n fovs in the Geo Video B, the time complexity is O (n), and then the LCVS (A, B) is calculated, so that the time complexity of the LCVS algorithm is O (n · m).
Assuming k is the number of simplified edges per FoV area (e.g. triangle k is 3 and rectangle k is 4), the common view CVW needs to compute the intersection and union of the FoV areas, and the computation cost of the LCVS algorithm can be decomposed into two parts:
(i) computational cost of CVW
(ii) Cumulative cost
The cost of the LCVS algorithm is defined as follows:
C evw =C interset +C eventpoint +C area ,
where Cinterset represents the cost of computing and ordering intersecting edges (e.g., number of edges 2k), centrepoint represents the cost of computing matching event points (e.g., 2k · log 2k + I · log 2k), and care represents the cost of computing FoV intersection regions. The cost of CVW is between Ccpu.2k.log 2k and Ccpu (2klog 2k + I.log 2k), where I is the number of intersections and Ccpu represents the time cost of calculating the number of views.
Then, the LCVS algorithm cost is specifically defined as follows:
Figure BDA0003610214050000121
the LCVS algorithm cost parameter index is shown in table 5:
Figure BDA0003610214050000122
TABLE 5
And S5, carrying out normalization processing on the video similarity distance to obtain a similarity value.
The method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and performs normalization processing on the video similarity distance to obtain a similarity value, so that the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view are realized.
Preferably, in any of the above embodiments, further comprising: optimizing the longest public subsequence algorithm by a preset method to obtain an optimized longest public subsequence algorithm;
the S4 specifically includes:
and calculating the video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on the optimized longest common subsequence algorithm.
The method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Preferably, in any of the above embodiments, the presetting method includes: a longest common subsequence algorithm based on minimum boundary segments, a longest common subsequence algorithm of minimum boundary triangles, or a longest common subsequence algorithm of minimum boundary rectangles.
In one embodiment, optimizing the longest common subsequence algorithm by a preset method may include:
through algorithm cost analysis of the LCVS algorithm, it can be known how the main performance bottleneck of the algorithm is how to calculate the FoV region, the intersection of the FoV region and the union of the FoV region. We performed the following experiments to evaluate and optimize the performance of the LCVS algorithm, taking four different approaches:
(i)LCSS
(ii) LCVS using MBS
(iii) LCVS using MBT
(iiii) LCVS Using MBR
The overall objective is to demonstrate the performance improvement of the three methods we propose (excluding the lcs algorithm) in identifying similar fovs based on a data set of mobile video and to answer the following two questions (the performance metric refers to the time taken by the calculation process and the accuracy of the output result):
(i) what impact the number of fovs will have on the algorithm?
(ii) What impact the size of the visible distance will have on the algorithm?
To ensure the authenticity and validity of experimental data, we obtained 4000 real driving videos from the BDD100K website in new york, usa. Based on these mobile video data, we have generated two different data sets:
(i) straight line direction FoV
(ii) Random direction FoV
In the FoV data set in the linear direction, the direction of a camera is aligned with the direction of a moving object and is directly obtained from a vehicle event data recorder; in a random direction FoV dataset, the direction of the camera is randomly changed and not aligned with the direction of the moving object, such FoV is obtained from a camera recording of a mobile device (e.g. a smartphone).
In order to improve the efficiency of the LCVS algorithm, the invention provides three alternative methods for reducing the cost of calculating the common view CVW.
The first method approximates the FoV area using the minimum boundary segment MBS (the minimum segment), which segments the FoV area using triangles of the same size, estimates the area of the FoV area according to the sum of the areas of the triangles, as shown in a in fig. 6, where the MBS minimum boundary segment method takes the number of triangles as linear time.
The second method uses the minimum bounding triangle mbt (the minimum triangles) to approximate the FoV area, thereby speeding up the calculation of the FoV area, as shown in b of fig. 6, the method approximates the FoV area by the minimum bounding triangle of the FoV sector, and estimates the area of the FoV area according to the area of the triangle.
The third method uses a minimum bounding rectangle MBR (the minimum rectangle) to approximate the FoV area, which is a common data representation method, as shown in c in fig. 6, and this method uses a fan-shaped minimum bounding rectangle of the FoV to approximate the FoV area, so that the computation cost of CVW can be significantly reduced in special cases (such as a car recorder camera).
The scheme approximates a visible area by a longest public subsequence algorithm of a minimum boundary segment; accelerating the calculation of the FoV region through the most common long subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
In one embodiment, evaluating the impact of the number of FoV, the impact of the visual distance, and the impact on the run-time and output accuracy of the LCVS algorithm may include:
we fixedly set the minimum time threshold σ to be 1, and the segment angle (i.e. the acute angle of the triangle) in MBS to be 5, which is as follows:
(i) assessing FoV number impact
The number of FoV evaluated in this experiment is between 1000 and 4000, and output accuracy is given as a and b in fig. 7, where a denotes a video data set whose shooting direction is the front direction, and b denotes a video data set whose shooting direction is the random direction, respectively.
According to the experimental results shown in fig. 7, our method is better than the lcs in the original state, and as the number of parallel fovs increases, the accuracy of the output result of the lcs increases and the difference between LCVS increases, which occurs because the lcs only considers the distance between a point and a point to measure the similarity between tracks; fig. 8 b shows that the LCVS of MBS performs better than the LCVS of MBT, but MBR performs worse because MBR only roughly estimates the FoV area and does not take into account the accuracy of the output.
(ii) Assessing visual distance effects
The visual distance range evaluated in this experiment is 10-60 meters, and a and b in fig. 8 respectively show the accuracy of the output result. a represents a video data set whose shooting direction is a front direction; b represents a video data set in which the shooting direction is a random direction;
according to the experimental results, the LCVS using the MBS and the LCVS using the MBT are superior to the LCVS using the MBR and the LCVS using the other methods. According to b in fig. 8, the LCVS performance using MBR is poor because the error range increases with the viewing distance in the data set in random direction.
(iii) Evaluating LCVS algorithm runtime
The variation range of the number of FoV evaluated in this experiment is 1000-4000, and the running time of the output result is shown in a and b in FIG. 9. A represents a video data set whose shooting direction is the straight ahead direction; b represents a video data set with a shooting direction in a random direction;
it is known from the figure that the LCSS running time is smaller than the LCSV using MBR, the LCSV using MBT and the LCSV using MBS, and the performance gap of the LCVS algorithm increases with the increase of the number of fovs, because the three methods we adopt calculate the CVW public view to measure the similarity between tracks, and the calculation cost increases with respect to the LCSS. However, the accuracy of LCSS is much lower than our three methods, so selecting LCVS to identify the similarity of mobile videos is a better choice. In addition, b in fig. 9 shows that the LCVS using MBS is slower than other methods because MBS requires linear time to calculate CVW common view, resulting in the LCVS running time using MBS being larger than other algorithms.
By testing a real automobile data recorder data set of 4000 mobile videos, an algorithm for measuring the similarity of the mobile videos based on the maximum public sub-view is realized, a cost evaluation model is used for testing the performance of the LCVS algorithm of the MBR, MBS and MBT methods, and the LCVS algorithm is guaranteed to accurately identify and calculate the similarity of the mobile videos.
Preferably, in any of the above embodiments, the calculating a preset coefficient between the first video frame and the second video frame according to the aggregation and the union specifically includes:
calculating the preset coefficient according to a first formula:
Figure BDA0003610214050000161
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Preferably, in any of the above embodiments, the S5 specifically includes: normalizing the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
Figure BDA0003610214050000171
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame.
In one embodiment, as shown in fig. 2, a similarity calculation system for a mobile video includes: an intersection union calculation module 1101, a video frame similarity calculation module 1102, a video sequence module 1103, a video sequence similarity calculation module 1104 and a similarity calculation module 1105;
the intersection union calculation module 1101 is configured to calculate an intersection and a union of a visible region of the first video frame and a visible region of the second video frame through a video frame data model;
the video frame similarity calculation module 1102 is configured to calculate a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determine a maximum common view similarity according to the preset coefficient;
the video sequence module 1103 is configured to convert the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
the video sequence similarity calculation module 1104 is configured to calculate, based on a longest common subsequence algorithm, video similarity distances between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the greatest common view similarity as a weight;
the similarity calculation module 1105 is configured to perform normalization processing on the video similarity distance to obtain a similarity value.
The method is based on a longest common subsequence algorithm, combines the first video frame sequence and the second video frame sequence, takes the maximum common view similarity as a weight value, calculates the video similarity distance of the first video frame, the pixel and the second video frame, and performs normalization processing on the video similarity distance to obtain a similarity value, so that the identification and calculation of the similarity of the mobile video measured based on the maximum common sub-view are realized.
Preferably, in any of the above embodiments, further comprising: the optimization module is used for optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the video sequence similarity calculation module is specifically configured to calculate, based on the optimized longest common subsequence algorithm, video similarity distances between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the maximum common view similarity as a weight.
The method and the device realize the improvement of the efficiency of similarity calculation, reduce the calculation amount and reduce the calculation cost.
Preferably, in any of the above embodiments, the presetting method includes: a longest common subsequence algorithm based on minimum boundary segments, a longest common subsequence algorithm of minimum boundary triangles, or a longest common subsequence algorithm of minimum boundary rectangles.
The scheme approximates a visible area by a longest public subsequence algorithm of a minimum boundary segment; accelerating the calculation of the FoV region through the longest public subsequence algorithm of the minimum boundary triangle; the calculation cost of CVW can be obviously reduced through the longest common subsequence algorithm of the minimum boundary rectangle; the method and the device ensure the accurate identification of the mobile video and the accurate calculation of the similarity of the mobile video.
Preferably, in any of the above embodiments, the video frame similarity calculating module 1102 is specifically configured to calculate the preset coefficient according to a first formula:
Figure BDA0003610214050000181
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
Preferably, in any embodiment above, the similarity calculation module 1105 is specifically configured to perform normalization processing on the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
Figure BDA0003610214050000191
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of visible regions of the first video frame and the second video frame, fov i Is the viewable area of the i-th video frame of the first video frame in a FoV model, FoV j Is the viewable area of the j-th video frame of the second video frame represented by the FoV model, head (a) represents a consecutive video frame sub-sequence of the first video frame, head (b) represents a consecutive video frame sub-sequence of the second video frame. It is understood that some or all of the alternative embodiments described above may be included in some embodiments.
It should be noted that the above embodiments are product embodiments corresponding to the previous method embodiments, and for the description of each optional implementation in the product embodiments, reference may be made to corresponding descriptions in the above method embodiments, and details are not described here again.
The reader should understand that in the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art can combine and combine features of different embodiments or examples and features of different embodiments or examples described in this specification without contradiction.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described method embodiments are merely illustrative, and for example, the division of steps into only one type of logical functional division may be implemented in practice in another manner, for example, multiple steps may be combined or integrated into another step, or some features may be omitted, or not implemented.
The above method, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for calculating similarity of a mobile video, comprising:
s1, calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through the video frame data model;
s2, calculating a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determining the maximum common view similarity according to the preset coefficient;
s3, converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence, respectively;
s4, calculating the video similarity distance between the first video frame and the pixel and the second video frame based on the longest common subsequence algorithm, combining the first video frame sequence and the second video frame sequence, and taking the maximum common view similarity as a weight value;
and S5, carrying out normalization processing on the video similarity distance to obtain a similarity value.
2. The method for calculating the similarity of a mobile video according to claim 1, further comprising: optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the S4 specifically includes:
and calculating the video similarity distance between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight value based on the optimized longest common subsequence algorithm.
3. The method for calculating the similarity of a mobile video according to claim 2, wherein the preset method comprises: a longest common subsequence algorithm based on a minimum border segment, a longest common subsequence algorithm of a minimum border triangle, or a longest common subsequence algorithm of a minimum border rectangle.
4. The method according to claim 1, wherein the calculating the preset coefficient between the first video frame and the second video frame according to the aggregation and the union specifically comprises:
calculating the preset coefficient according to a first formula:
Figure FDA0003610214040000021
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
5. The system for calculating similarity of mobile videos according to claim 1 or 4, wherein the step S5 specifically includes: normalizing the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
Figure FDA0003610214040000022
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of a visible region of the first video frame and the second video frame, fov i A visible area of an i-th video frame of said first video frame represented in a FoV model, FoV j Is a viewable area of a jth video frame of said second video frame represented by a FoV model, head (a) represents a consecutive video frame sub-sequence of said first video frame, head (b) represents a consecutive video frame sub-sequence of said second video frame.
6. A similarity calculation system for a mobile video, comprising: the device comprises an intersection union set calculation module, a video frame similarity calculation module, a video sequence similarity calculation module and a similarity calculation module;
the intersection union calculation module is used for calculating the intersection and union of the visual area of the first video frame and the visual area of the second video frame through a video frame data model;
the video frame similarity calculation module is configured to calculate a preset coefficient between the first video frame and the second video frame according to the intersection and the union, and determine a maximum common view similarity according to the preset coefficient:
the video sequence module is used for converting the first video frame and the second video frame into a first video frame sequence and a second video frame sequence respectively;
the video sequence similarity calculation module is used for calculating the video similarity distance between the first video frame and the pixel and between the second video frame based on a longest common subsequence algorithm, combining the first video frame sequence and the second video frame sequence and taking the maximum common view similarity as a weight;
the similarity calculation module is used for carrying out normalization processing on the video similarity distance to obtain a similarity value.
7. The system for calculating similarity of mobile videos according to claim 6, further comprising: the optimization module is used for optimizing the longest public subsequence algorithm by a preset method to obtain the optimized longest public subsequence algorithm;
the video sequence similarity calculation module is specifically configured to calculate, based on the optimized longest common subsequence algorithm, video similarity distances between the first video frame and the pixel and between the first video frame and the second video frame by combining the first video frame sequence and the second video frame sequence and using the maximum common view similarity as a weight.
8. The system for calculating the similarity of mobile videos according to claim 7, wherein the preset method comprises: a longest common subsequence algorithm based on a minimum border segment, a longest common subsequence algorithm of a minimum border triangle, or a longest common subsequence algorithm of a minimum border rectangle.
9. The system according to claim 6, wherein the video frame similarity calculation module is specifically configured to calculate the preset coefficient according to a first formula:
Figure FDA0003610214040000031
wherein, View (fov) i ) A visual field, View, (fov) representing the ith video frame of the first video j ) Representing a viewable area of a jth video frame of the second video.
10. The system according to claim 6 or 9, wherein the similarity calculation module is specifically configured to perform normalization processing on the video similarity distance through a second calculation formula to obtain a similarity value; the second calculation formula is:
Figure FDA0003610214040000041
wherein, LCVS δ (A, B) represents the similarity distance between the first video frame and the video frame of the second video, A represents the first video frame, B represents the second video frame, i represents the frame number of the first video, j represents the frame number of the second video, CVW (A. fov) i ,B·fov j ) Representing a degree of similarity of a visible region of the first video frame and the second video frame, fov i A visible area of an i-th video frame of said first video frame represented in a FoV model, FoV j Is a viewable area of a jth video frame of said second video frame represented by a FoV model, head (a) represents a consecutive video frame sub-sequence of said first video frame, head (b) represents a consecutive video frame sub-sequence of said second video frame.
CN202210430592.5A 2022-04-22 2022-04-22 Similarity calculation method and system for mobile video Pending CN114973060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210430592.5A CN114973060A (en) 2022-04-22 2022-04-22 Similarity calculation method and system for mobile video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210430592.5A CN114973060A (en) 2022-04-22 2022-04-22 Similarity calculation method and system for mobile video

Publications (1)

Publication Number Publication Date
CN114973060A true CN114973060A (en) 2022-08-30

Family

ID=82979420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210430592.5A Pending CN114973060A (en) 2022-04-22 2022-04-22 Similarity calculation method and system for mobile video

Country Status (1)

Country Link
CN (1) CN114973060A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110081043A1 (en) * 2009-10-07 2011-04-07 Sabol Bruce M Using video-based imagery for automated detection, tracking, and counting of moving objects, in particular those objects having image characteristics similar to background
US20140347475A1 (en) * 2013-05-23 2014-11-27 Sri International Real-time object detection, tracking and occlusion reasoning
US20170169297A1 (en) * 2015-12-09 2017-06-15 Xerox Corporation Computer-vision-based group identification
KR102043366B1 (en) * 2018-11-21 2019-12-05 (주)터보소프트 Method for measuring trajectory similarity between geo-referenced videos using largest common view
CN112257595A (en) * 2020-10-22 2021-01-22 广州市百果园网络科技有限公司 Video matching method, device, equipment and storage medium
CN112559309A (en) * 2020-12-18 2021-03-26 无线生活(北京)信息技术有限公司 Method and device for adjusting page performance acquisition algorithm
CN112904331A (en) * 2019-11-19 2021-06-04 杭州海康威视数字技术股份有限公司 Method, device and equipment for determining movement track and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110081043A1 (en) * 2009-10-07 2011-04-07 Sabol Bruce M Using video-based imagery for automated detection, tracking, and counting of moving objects, in particular those objects having image characteristics similar to background
US20140347475A1 (en) * 2013-05-23 2014-11-27 Sri International Real-time object detection, tracking and occlusion reasoning
US20170169297A1 (en) * 2015-12-09 2017-06-15 Xerox Corporation Computer-vision-based group identification
KR102043366B1 (en) * 2018-11-21 2019-12-05 (주)터보소프트 Method for measuring trajectory similarity between geo-referenced videos using largest common view
CN112904331A (en) * 2019-11-19 2021-06-04 杭州海康威视数字技术股份有限公司 Method, device and equipment for determining movement track and storage medium
CN112257595A (en) * 2020-10-22 2021-01-22 广州市百果园网络科技有限公司 Video matching method, device, equipment and storage medium
CN112559309A (en) * 2020-12-18 2021-03-26 无线生活(北京)信息技术有限公司 Method and device for adjusting page performance acquisition algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEI DING 等: "Measuring similarity between geo-tagged videos using largest common view", 《ELECTRONICS LETTERS》, vol. 55, no. 8, pages 1 - 2 *
WEI DING 等: "VVS: Fast Similarity Measuring of FoV-Tagged Videos", 《IEEE ACCESS》, pages 1 - 12 *
黄可欣: "遮挡场景下的行人重识别算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 3 *

Similar Documents

Publication Publication Date Title
US11003956B2 (en) System and method for training a neural network for visual localization based upon learning objects-of-interest dense match regression
US9349189B2 (en) Occlusion resistant image template matching using distance transform
CN105628951B (en) The method and apparatus of speed for measurement object
US11151382B2 (en) Opportunity to view an object in image processing
CN111950394B (en) Method and device for predicting lane change of vehicle and computer storage medium
Zhang et al. Efficient auto-refocusing for light field camera
Kocur et al. Detection of 3D bounding boxes of vehicles using perspective transformation for accurate speed measurement
Mao et al. Uasnet: Uncertainty adaptive sampling network for deep stereo matching
Lu et al. An improved graph cut algorithm in stereo matching
CN105608209A (en) Video labeling method and video labeling device
CN112712703A (en) Vehicle video processing method and device, computer equipment and storage medium
Liu et al. Visual object tracking with partition loss schemes
CN112989877A (en) Method and device for labeling object in point cloud data
CN114359361A (en) Depth estimation method, depth estimation device, electronic equipment and computer-readable storage medium
Wu et al. A dynamic infrared object tracking algorithm by frame differencing
Zekany et al. Classifying ego-vehicle road maneuvers from dashcam video
Ding et al. VVS: Fast Similarity Measuring of FoV-Tagged Videos
CN112215036B (en) Cross-mirror tracking method, device, equipment and storage medium
CN111259702B (en) User interest estimation method and device
CN114973060A (en) Similarity calculation method and system for mobile video
CN116883981A (en) License plate positioning and identifying method, system, computer equipment and storage medium
Christiansen et al. Monocular vehicle distance sensor using HOG and Kalman tracking
CN116129154A (en) Image object association method, computer device, and storage medium
CN112818743B (en) Image recognition method and device, electronic equipment and computer storage medium
Kalampokas et al. Performance benchmark of deep learning human pose estimation for UAVs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220830

RJ01 Rejection of invention patent application after publication