CN118118687A

CN118118687A - Feature extraction and comparison method for video tamper resistance

Info

Publication number: CN118118687A
Application number: CN202211517585.5A
Authority: CN
Inventors: 吴克松; 杨鹏; 张晨; 李基强; 耿晓宇; 杨凯
Original assignee: Beijing Tonghe Shiyi Telecommunication Science And Technology Research Institute Co ltd; Data Communication Science & Technology Research Institute; Xingtang Telecommunication Technology Co ltd
Current assignee: Beijing Tonghe Shiyi Telecommunication Science And Technology Research Institute Co ltd; Data Communication Science & Technology Research Institute; Xingtang Telecommunication Technology Co ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2024-05-31

Abstract

The invention relates to a feature extraction and comparison method for video tamper resistance, belongs to the technical field of video processing, and solves the problems that the video quality and tamper resistance detection effectiveness, the integrity check from a transmitting end to a receiving end of a video, the operation amount is too large, the instantaneity is poor and the like cannot be considered in the prior art. Extracting a plurality of frames including an I frame at uniform intervals for each GOP in a video code stream to be transmitted at a transmitting end, and processing to obtain transmitting end conversion characteristic value data; extracting a plurality of frames including an I frame at a receiving end aiming at each GOP in a received video code stream, and processing to obtain receiving end transformation characteristic value data; and comparing the transformation characteristic value data of the sending end and the receiving end to obtain a judging result about whether the video is tampered. The method has the advantages of no influence on the quality of the original video, low comparison information quantity, high transmission efficiency, high extraction efficiency of comparison features, good comparison effect, and good tamper-proof detection effectiveness and instantaneity.

Description

Feature extraction and comparison method for video tamper resistance

Technical Field

The invention relates to the technical field of video processing, in particular to a feature extraction and comparison method for video tamper resistance.

Background

The video data has the characteristics of large information quantity, complex and changeable scene images and the like, is very easy to attack in the aspects of replacement, falsification, counterfeiting and the like, especially along with the continuous development of artificial intelligence, the video is more hidden and confusing except for simple code stream replacement by replacing or falsifying key information of the video by an intelligent means, and in this case, a receiver of the video is often difficult to find that the video is falsified. Video tampering attacks mainly occur in application loads, and conventional link, network transmission and even application layer protection means are not easy to monitor. Because of the large information content of video data, conventional integrity verification based on cryptographic technology is difficult to meet timeliness requirements. In addition, a small amount of packet loss or secondary encoding and decoding of a middle forwarding processing platform often exist in video communication transmission, and the two cases belong to normal service categories, and at the moment, the integrity check value is inconsistent by adopting a traditional password hash algorithm, so that misjudgment occurs. With the continuous development of technology, video conferences and video command systems are continuously popularized in various industry applications, and the safety protection requirement of video systems is gradually highlighted. In video conference, video command and other applications, video communication is an important guarantee for scheduling command, decision transmission and information exchange, and once the phenomena of video picture substitution, video content tampering, illegal information insertion and the like occur, important work disorder and even out of control can be caused, and very bad influence can be generated. The video tampering is identified rapidly, efficiently and accurately, the network attack is found as soon as possible, the error information is prevented from being transmitted, and the meaning is great.

The video tamper resistance is an important component of video security, mainly adopts a video consistency comparison-based technology, and common methods include a video watermark-based method, a macro block-based video feature comparison method and a decoded video image pixel level direct comparison method. Video watermarking refers to an information hiding technique in which watermark information is embedded in the video in some form. The video tamper-proof method based on the video watermark is to embed the watermark into the source video to be transmitted at the transmitting end and judge whether the video is tampered by detecting the watermark at the receiving end. However, the source video is modified by the video tamper-proof method of embedding the watermark, especially the high-robustness watermark is often embedded in a low-frequency component area of the video, the video quality is affected to a certain extent, and the vulnerability watermark embedded in a high-frequency component area is possibly filtered out when signal processing such as secondary encoding and decoding of the video, filtering and the like is carried out, and the watermark cannot be completely recovered, so that misjudgment of tamper-proof detection is caused. In addition, if the watermark is embedded in the video image in an excessive amount and widely and uniformly distributed, the video quality may be affected, and the watermark is embedded in a small amount and is not fully covered, so that local tampering and omission of the video may occur. The macro block-based video feature comparison method is to generate a macro block feature map of a video frame according to the macro block position, the macro block size, the macro block type and the like of a macro block structure of a frame picture to which video data belong, so as to determine a video fingerprint feature value of the frame data. And judging whether the video is tampered in the transmission process by comparing the video fingerprint characteristic values of the video receiving end and the video transmitting end. The method does not need to decode the video, has higher operation efficiency, but because different video encoders encode the same video, even if the same encoding mode is adopted, the same encoding parameters are set, but the characteristics of the encoded macro block have some differences, once the macro block is secondarily encoded in the video transmission process, the characteristic value of the macro block changes, so the method can not really realize the integrity check from the transmitting end to the receiving end. The pixel-level direct comparison method based on the video image has the characteristics of simplicity and intuitiveness, and whether the image is tampered can be accurately identified. However, with the continuous development of high definition and wide dynamic range of video, the operation amount of the method is rapidly increased, and the real-time requirement is difficult to meet.

In summary, the prior art has the problems that the video watermark-based method cannot achieve both high video quality and high effectiveness of tamper-proof detection, macro block characteristic values of the macro block-based video characteristic comparison method are easy to change, so that the integrity test of the video from a transmitting end to a receiving end is difficult, the pixel-level direct comparison method based on video images is too large in operand, and real-time requirements are difficult to meet.

Disclosure of Invention

In view of the above analysis, the present invention aims to provide a feature extraction and comparison method for video tamper resistance, so as to solve the problems that in the prior art, a method based on video watermarking cannot achieve both high video quality and high effectiveness of tamper resistance detection, a macroblock feature value of a macroblock-based video feature comparison method is easy to change, so that the difficulty of integrity test of a video from a transmitting end to a receiving end is high, and the operand of a pixel-level direct comparison method based on a video image is too large to satisfy real-time requirements.

The aim of the invention is mainly realized by the following technical scheme:

The embodiment of the invention provides a feature extraction and comparison method for video tamper resistance, which comprises the following steps:

S1, uniformly extracting a plurality of frames including an I frame at intervals for each GOP in a video code stream to be transmitted at a transmitting end, and processing to obtain transmitting end conversion characteristic value data;

s2, extracting a plurality of frames including an I frame in the same way as in the step S1 for each GOP in the received video code stream at the receiving end, and obtaining conversion characteristic value data of the receiving end through processing in the same way as in the step S1;

S3, comparing the transmission end transformation characteristic value data with the receiving end transformation characteristic value data to obtain a judging result about whether the video is tampered.

Based on the further improvement of the method, the method extracts a plurality of frames including I frames at even intervals for each GOP in the video code stream at the transmitting end, and obtains the transmitting end transformation characteristic value data through processing, and the method comprises the following steps:

extracting a plurality of frames including I frames at even intervals for each GOP in a video code stream to be transmitted to obtain comparison extraction frame data;

Extracting frame data based on the comparison to obtain gray image data;

performing secondary blocking and sequential splicing processing on the gray image data to obtain gray average characteristic image data;

and performing DCT (discrete cosine transform) on the gray average characteristic image data, and processing to obtain transmitting end transformation characteristic value data.

Based on a further improvement of the above method, for each GOP in a video bitstream to be transmitted, several frames including I frames are extracted at even intervals, resulting in comparison extracted frame data, including:

based on the frame extraction time interval t and the video code stream frame rate f, calculating to obtain a frame extraction interval ceil (tf); where ceil () represents an upward rounding function;

Based on the frame extraction interval ceil (tf) and the number of frames n contained in one GOP, the number of frames extracted in the GOP is calculated as Wherein floor () is a downward rounding function;

extracting the number of frames from the first frame at uniform intervals in each GOP in the video code stream to be transmitted as Is used to extract frame data.

Based on a further improvement of the above method, extracting frame data based on the comparison, obtaining gray image data, comprising:

Obtaining the data size of gray data of each frame in the comparison extraction frame data based on the resolution W.H of each frame in the comparison extraction frame data; wherein W is width, H is height, and the gray data of each frame in the extracted frame data is W.H bytes;

Based on the data size, the first W.H byte data of each frame image in the comparison extraction frame data are read and obtained to serve as the gray level data of the frame, namely the gray level image data.

Based on the further improvement of the method, the gray level image data is subjected to two-level blocking and sequential splicing processing to obtain gray level average characteristic image data, which comprises the following steps:

performing secondary segmentation processing on the gray image data to obtain an initial segmentation of an I frame and an initial segmentation of a P frame, and processing to obtain an average value generation segmentation of the I frame and an average value generation segmentation of the P frame;

And respectively splicing the average value generation blocks of the I frames and the average value generation blocks of the P frames in sequence to obtain gray level average characteristic image data of the I frames and gray level average characteristic image data of the P frames, which are collectively called as gray level average characteristic image data.

Based on the further improvement of the method, the gray image data is subjected to two-stage blocking processing, and initial blocking of an I frame and initial blocking of a P frame are obtained first, and the method comprises the following steps:

uniformly dividing each frame in the gray image data into uniform blocks with the size of j, wherein for an I frame, the corresponding uniform block is the initial block of the I frame; wherein the initial block number of each frame I frame is

And for the P frame, based on the uniform blocks, performing interleaving selection processing to obtain initial blocks of the P frame.

Based on further improvement of the method, the processing to obtain the mean value generation block of the I frame and the mean value generation block of the P frame comprises the following steps:

Dividing the initial block of the I frame and the initial block of the P frame respectively, and obtaining a plurality of blocks with r-r sizes in each initial block;

For each r×r block, the gray average eigenvalue of the r×r block is calculated as follows:

Wherein p _x,y represents the gray value of the pixel in r x r block whose position is (x, y);

Calculating gray average characteristic values of all r-r blocks in the initial block, and sequentially splicing to obtain blocks with the size of i-i, namely, generating average blocks; wherein,

And obtaining all average value generation blocks of the I frame of each frame and all average value generation blocks of the P frame of each frame.

Based on the further improvement of the method, DCT transformation is carried out on the gray average characteristic image data, and the data of the transformation characteristic value of the transmitting end is obtained after processing, comprising the following steps:

DCT transformation is carried out on the gray level average characteristic image data to obtain transformed gray level image data;

after the converted gray image data is quantized, the processed converted compressed gray image data is obtained;

Based on the transformation compressed gray image data, obtaining transformation characteristic value data of a transmitting end through extraction and processing; wherein the transformation characteristic value data includes a transformation characteristic value of each frame; the transformation characteristic value of each frame comprises a plurality of transformation characteristic values of initial blocks formed by sequential splicing.

Based on a further improvement of the method, performing DCT on the gray-scale average characteristic image data to obtain transformed gray-scale image data, including:

the DCT transform is performed for each frame in the gray-scale average feature image data as follows:

wherein u, v are matrix subscripts of the frequency domain result of DCT transformation, and p _x,y represents pixel values, namely gray average feature values corresponding to pixels (x, y) in the gray average feature image data;

And summarizing DCT conversion results of each frame of gray level average characteristic image data to obtain converted gray level image data.

Based on the further improvement of the method, the comparison of the transformation characteristic value data of the transmitting end and the transformation characteristic value data of the receiving end to obtain the judging result about whether the video is tampered comprises the following steps:

and comparing the transformation characteristic value data of the transmitting end with the transformation characteristic value data of the receiving end of the corresponding frame by using a Hamming method, and if:

if the hamming distances of the transformation characteristic values of all the initial blocks are larger than the judgment threshold, judging that video replacement has occurred;

The Hamming distance of the transformation characteristic values of part of the initial blocks is larger than a judging threshold value, and the Hamming distances of the characteristic values of other initial blocks are smaller than the judging threshold value, so that the video local tampering is judged to have occurred;

and if the hamming distances of the characteristic values of all the initial blocks are larger than 0 and smaller than the judgment threshold, judging that video tampering or video replacement does not occur.

Compared with the prior art, the invention has at least one of the following beneficial effects:

1. according to the invention, the video tamper-proof comparison is performed by extracting the image gray characteristic value, the original video is not required to be changed, and the quality of the original video is not influenced.

2. Compared with the extraction of video frames, the method and the device have the advantages that the length of the GOP is considered based on the time interval, the I frame of each GOP is ensured to be extracted, and the rest frames are extracted according to the uniform time interval. The detection efficiency is improved through frame extraction at intervals while the tamper-proof detection effectiveness is guaranteed, and the live broadcast service requirement can be met through time interval setting, so that tampering can be found in time.

3. The invention is based on the compression processing of the video image gray scale, carries out initial blocking aiming at the video image, improves the parallelism, selects all the initial blocking of the I frame, adopts a parity interleaving mode to select the initial blocking of the P frame, and reduces the comparison information quantity.

4. And obtaining an average value generation block by gray average value calculation aiming at the initial block, sequentially splicing, and then performing DCT transformation quantization to further compress video image information and improve transmission efficiency.

5. The invention scans one by one according to the sequence of the initial blocks based on the processed conversion compressed gray image to obtain a character string, takes the character string as the characteristic value of the initial blocks, and splices the characteristic values of all the initial blocks together according to the sequence to obtain the conversion characteristic value of the video image frame, thereby having high extraction efficiency and good comparison effect.

In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

FIG. 1 is a flow chart of a feature extraction and comparison method for video tamper resistance according to the present invention;

FIG. 2 is a schematic diagram of a video feature extraction method according to the present invention;

FIG. 3 (a) is a schematic diagram illustrating an initial blocking method for selecting an odd-numbered P frame after an I frame according to the present invention;

FIG. 3 (b) is a schematic diagram illustrating an exemplary method for selecting an initial block for an even number of P frames after an I frame according to the present invention;

FIG. 4 is a flowchart of a video feature extraction method according to the present invention;

fig. 5 is a schematic diagram of a video feature comparison method according to the present invention.

Detailed Description

The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.

Because the video tamper resistance focuses on the consistency of the features of the video image, the texture, color, contrast and other information of the video image are not needed to be focused on in practice, and the tamper of the video image is not just the change of which pixel point or even which macro block, once the video tamper occurs, all the macro blocks in the local range of the video image can be reflected.

Example 1

In one embodiment of the present invention, a feature extraction and comparison method for video tamper resistance is disclosed, as shown in fig. 1, comprising:

S1, as shown in FIG. 2, processing is performed at a transmitting end in the following manner to obtain transmitting end conversion characteristic value data:

Extracting frame data based on the comparison to obtain gray image data;

S2, at the receiving end, extracting comparison extraction frame data corresponding to the sending end aiming at each GOP in the video code stream, and processing the comparison extraction frame data in the same mode as in the step S1 to obtain conversion characteristic value data of the receiving end.

Compared with the prior art, the embodiment of the invention performs video tamper-proof comparison by extracting the image gray characteristic value, does not need to change the original video, and does not influence the quality of the original video; the comparison is based on considering the length of GOPs based on time intervals, ensuring that the I-frames of each GOP are extracted, and the rest of the frames are extracted at uniform time intervals. The detection efficiency is improved through frame extraction at intervals while the tamper-proof detection effectiveness is guaranteed, and the live broadcast service requirement can be met through time interval setting, so that tampering can be found in time.

Example 2

The optimization is performed on the basis of the embodiment 1, and the step S1 can be further refined into the following steps:

s11, uniformly extracting a plurality of frames including the I frame at intervals aiming at each GOP of the video code stream at a video transmitting end to obtain comparison extraction frame data.

In a video bitstream, a plurality of consecutive frame pictures starting with a key frame, i.e., an I frame, are divided into a GOP, and the GOP includes a picture sequence consisting of an I frame, a P frame, and the like. Once video tampering occurs, there is temporal persistence, reflecting the video frames, and there is a continuous multi-frame image inconsistency.

Because the application scene of the invention is a remote live scene such as a video conference or a video command, and the quality and safety of the video code stream are ensured through higher detection frequency, the embodiment of the invention extracts a plurality of frames including I frames at even intervals for each GOP of the video code stream at the transmitting end, and not only extracts the I frames possibly with larger frame intervals; the number of the frame extraction intervals can be adjusted according to the video frame rate. If the video is replaced, the video can be identified no matter how large the extraction ratio is; if the video image is locally tampered, the video image is reflected in a plurality of continuous frames.

It should be noted that, a GOP may include multiple I frames, usually, a GOP includes only one I frame, but a GOP can only start with an I frame, so that in the embodiment of the present invention, the first frame of each GOP, i.e., the I frame, is extracted at an even interval, so that it can be ensured that the extracted comparative extracted frame data corresponding to the GOP includes an I frame; instead of having to extract the other I frames in the GOP than the first frame, it is determined whether or not to extract based on the frame interval.

Specifically, in the embodiment of the invention, the frame extraction time interval is set to be t, and if the frame rate of the video code stream is f, the frame extraction interval is ceil (tf); where ceil () represents a round-up function. If a GOP contains n frames, then the number of frames extracted in a GOP isWherein floor () is a round down function.

Extracting I frames containing the first frame and a plurality of subsequent P frames at uniform intervals from the first frame for each GOP of the video code stream at the transmitting endAnd (5) frames, and obtaining comparison extraction frame data.

The GOP-based extraction has the advantage of ensuring that the video frame image corresponding to the I frame of the first frame can be extracted, because the possibility of tampering the I frame is highest once video tampering occurs.

By way of example, the embodiment of the invention sets the frame extraction time interval to be 0.5s, so that two frames can be extracted every second, and by combining the comparison result of the two frames, whether tampering exists can be judged in a time slightly more than 1 s. If the frame rate of the video code stream is f, the frame extraction interval is ceil (0.5 f); if a GOP contains n frames, then the number of frames extracted in a GOP is

S12, extracting frame data based on comparison, and obtaining gray image data.

In the video image represented in YUV format, Y represents luminance information of a pixel, U represents chrominance information of the pixel, and V represents saturation information of the pixel.

Since the Y information can display a complete image, each pixel is represented by one byte, and the image entirely composed of the Y information is a gray scale image, the U, V information may not be paid attention to when tamper-proof detection of the video image is performed.

There are two major types of YUV formats, namely planar and packet, in which the Y value of a pixel is stored first, and then the U and V values of the pixel are stored. In the packed format, Y, U and V values of the pixel point are alternately stored.

The embodiment of the invention adopts a planar format, and the resolution of each frame image in the comparison extraction frame data is marked as W.times.H; wherein W is width and H is height. Since each pixel point of the gray level image occupies one byte, then the data of the first w×h bytes of each frame image in the compared and extracted frame data is read into the buffer area, and the data in the buffer area is the required Y data, namely the gray level image data. The purpose of video image graying is to simplify the video image matrix, i.e. the video image expressed in YUV, and to increase the operation speed.

The gray image data is the gray data of each frame image in the comparison and extraction frame data.

S13, performing secondary block processing on the gray image data to obtain initial blocks of the I frame image and the P frame image, and generating blocks of the average value of the I frame image and the average value of the P frame image.

The visual perception of detecting whether an image has been tampered with is for pixel-by-pixel contrast of two frames of images, but is not actually necessary. If the image is tampered, the tampered size is far beyond the macro block level, and how to extract the characteristic value with proper granularity and ensure the tampered discovery is the key of algorithm design.

The second level partitions include an initial partition and a mean generation partition.

Specifically, the embodiment of the invention uniformly divides each frame image in the gray image data into blocks with the size of j x j, and for the I frame image, the blocks are initial blocks, and then the initial block number obtained by each frame image is

For the P frame image, in order to further improve the comparison efficiency, on the basis of the uniform block, an interleaving selection mode is adopted to obtain an initial block of the P frame image:

For the odd-numbered P frame images such as1 st, 3 rd, 5 … th and the like extracted after the I frame image, selecting according to an initial block selection method shown in fig. 3 (a), and selecting a black initial block representation; for even P frame images such as 2 nd, 4 th, 6 … th, etc. extracted after the I frame, the initial block selection method shown in FIG. 3 (b) is adopted, and a black initial block representation is selected.

When the mean value generation blocking is performed, for all initial blocking of the I frame image, i.e., the I frame image initial blocking, and the initial blocking obtained after the P frame image is subjected to the interleaving selection in the manner shown in fig. 2, i.e., the P frame image initial blocking, the embodiment of the invention performs the mean value operation according to the set r×r size for the pixel points in the initial blocking, and the r size can be adjusted according to the feature extraction precision and the compression ratio. In the embodiment of the present invention, for each r×r block divided by the initial block of the I frame image and the initial block of the P frame image, the average gray value of the block is calculated as follows: the value can be used as a gray average characteristic value of r x r blocks, wherein p _x,y represents a gray value of a pixel with a position (x, y) in the r x r blocks, and in a gray image, each pixel point represents a gray value, and the gray value range is 0-255.

The gray average eigenvalue of all r blocks is calculated, so that a new block i is obtained, namely the average generated block,The average value generation block size is greatly reduced, so that the operation amount can be further reduced.

Illustratively, in 1080P video processing, the width W is 1920 and the height H is 1080; the width H in actual encoding is generally processed as 1088. j=128, r=16, i=8, then the I frame image is segmented to obtain 15×8=120 initial segments, and the average value of each initial segment is calculated according to the proportion of 16×16, so that the average value generation segment size is 8×8=64. The reduction ratio with respect to the initial tile image is 256:1. In addition, the P frame image adopts an interleaving mode to select blocks, so the reduction ratio of the P frame image is 512:1.

Frame IThe gray level images of the average value generation blocks are spliced together in sequence to obtain/>Gray-scale average feature image data of a size. The number of P frames is halved with respect to I frames.

The gray-scale average feature image data of the I-frame image and the gray-scale average feature image data of the P-frame image are collectively referred to as gray-scale average feature image data.

S14, DCT transformation is carried out on the gray level average characteristic image data, and the characteristic image data is obtained after processing.

In order to further compress the information amount, the transmission of the video image characteristic data is facilitated. Performing DCT on the gray level average characteristic image data of each frame of image, wherein the DCT has the following general formula:

in an exemplary embodiment of the present invention, the most efficient 8 x 8 DCT transform is used, and the transform formula at this time is as follows:

the correlation of the frequency domain can be removed through the transformation, and the transformed gray image data can be obtained.

And quantizing the converted gray image data to obtain converted compressed gray image data.

Will beThe segmented transformation compressed gray image data are spliced together in sequence and can be used as characteristic images of the video image frames; carrying out the processing on each I frame image and each P frame image to obtain characteristic image data; the feature image data includes a feature image of each of the I-frame image and the P-frame image.

S15, based on the characteristic image data, obtaining transmitting end conversion characteristic value data through extraction and processing.

The video image characteristic value refers to characteristic data capable of representing a video image, and generation of the video image characteristic value is the basis and key of an efficient video consistency comparison method.

Based on each transformation compressed gray image in a frame characteristic image, scanning one by one according to the block sequence, and obtaining a character string for each block, wherein the character string is used as a transformation characteristic value of an initial block. Will beThe characteristic values of the initial blocks are spliced together in sequence, so that the transformation characteristic values of the video image frames are obtained.

It will be appreciated that the transformed feature value for each video image frame comprises a plurality of sequentially stitched, initially segmented transformed feature values.

And carrying out the processing on the transformation compressed gray level image of each I frame image and P frame image in the characteristic image data to obtain the transformation characteristic value data of the transmitting end.

The foregoing process may be summarized as shown in fig. 4.

It is noted that, when the transmitting end transmits the video code stream, the transmitting end needs to complete the extraction of the transmitting end conversion characteristic value data, and after the transmitting end extracts the transmitting end conversion characteristic value data, the transmitting end transmits the characteristic value data through the extension field of the RTP header. If the extracted ratio is larger for a video frame, it needs to be transmitted in a plurality of RTP packets, then the characteristic information of the extracted frame is appended to the last packet of the frame, i.e. the packet with mark=1.

Preferably, step S2 may be further refined as follows:

Processing is carried out at the receiving end according to the following steps to obtain the data of the transformation characteristic value of the receiving end:

extracting a plurality of frames including I frames at even intervals for each GOP in a video code stream to obtain comparison extraction frame data;

Extracting frame data based on the comparison to obtain gray image data;

Performing two-stage fractional sampling on the gray image data, and processing to obtain gray average characteristic image data;

and performing DCT (discrete cosine transform) on the gray average characteristic image data, processing to obtain characteristic image data, and extracting to obtain receiving end transformation characteristic value data.

It is noted that, the receiving end reads the extension field of the RTP header through the video image feature value receiving and extracting method shown in fig. 4, and can determine which video frames are the comparison video frames extracted by the transmitting end while acquiring the transmitting end conversion feature value data, so as to realize rapid matching comparison. Because the transmitting end transforms the characteristic value data and attaches to the corresponding frame, the problem of matching disorder of the corresponding frame at the transmitting end and the receiving end does not exist.

Preferably, step S3 may be further refined as follows:

In the embodiment of the invention, as shown in fig. 5, the video characteristic value comparison method adopts a hamming method to compare the transmitting end transformation characteristic value data with the receiving end transformation characteristic value data. The hamming method is the most efficient method to compare the two vector differences. And performing exclusive or operation on the two characteristic values, wherein the obtained hamming distance is the characteristic difference value, and the larger the value is, the higher the possibility of tampering the video is.

Specifically, for the video image frames to be compared, in the transformation characteristic values of the video image frames at the transmitting end and the transformation characteristic values of the video image frames at the receiving end, if the hamming distances of the transformation characteristic values of all the initial blocks are larger than a judgment threshold value, judging that video replacement has occurred; if the Hamming distance of the transformation characteristic values of part of the initial blocks is larger than the judging threshold value, and the Hamming distances of the characteristic values of other initial blocks are smaller than the judging threshold value, judging that the local tampering of the video occurs; if the hamming distance of the feature values of all the initial blocks is larger than 0 and smaller than the judgment threshold, judging that video tampering or video replacement does not occur, wherein the secondary encoding and decoding and packet loss and the like can generate a small amount of distortion.

Based on the method, whether the video content is tampered or not and which local area of the video image is tampered can be rapidly and effectively compared, and the method is simple, rapid and effective, low in cost, easy to deploy and low in occupied bandwidth.

In the embodiment of the invention, the larger j is, the larger the initial block is, the smaller i is, and the coarser the sampling granularity of the pixel point is, so thatRepresenting the accuracy/>, of the algorithmIf the sensitivity of tamper-proof comparison is set to be alpha, then there is/>K is the sensitivity coefficient. Setting T as a Hamming distance comparison threshold, namely a judgment threshold,/>Where R represents the resolution of the comparison image.

Compared with the prior art, the embodiment of the invention is based on the compression processing of the video image gray scale, carries out initial blocking aiming at the video image, improves the parallelism, selects all the initial blocking of the I frame, adopts a parity interleaving mode to select the initial blocking, and reduces the comparison information quantity. And obtaining an average value generation block by gray average value calculation aiming at the initial block, sequentially splicing, and then performing DCT transformation quantization to further compress video image information and improve transmission efficiency. Based on the processed conversion compressed gray level image, scanning one by one according to the sequence of the initial blocks to obtain a character string, taking the character string as the characteristic value of the initial blocks, and splicing the characteristic values of all the initial blocks together according to the sequence to obtain the conversion characteristic value of the video image frame, wherein the extraction efficiency is high and the comparison effect is good.

Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims

1. A feature extraction and comparison method for video tamper resistance, comprising the steps of:

2. The feature extraction and comparison method for video tamper resistance according to claim 1, wherein extracting a plurality of frames including an I frame at a uniform interval for each GOP in a video bitstream at a transmitting end, and processing to obtain transmitting-end transform feature value data, comprises:

Extracting frame data based on the comparison to obtain gray image data;

3. The feature extraction and comparison method for video tamper resistance according to claim 2, wherein extracting a number of frames including an I frame at even intervals for each GOP in a video code stream to be transmitted, to obtain comparison extracted frame data, comprises:

4. A feature extraction and comparison method for video tamper resistance according to claim 2 or 3, wherein extracting frame data based on the comparison, obtaining grayscale image data, comprises:

5. The method for extracting and comparing features of video tamper-proofing according to claim 2, wherein performing two-level blocking and sequential stitching processing on the gray scale image data to obtain gray scale average feature image data, comprises:

6. The method for video tamper-resistant feature extraction and comparison of claim 5, wherein performing a two-stage segmentation process on the grayscale image data to obtain an initial segmentation of an I frame and an initial segmentation of a P frame, comprises:

7. The method for video tamper-resistant feature extraction and comparison of claim 5, wherein the processing results in an I-frame mean generation block and a P-frame mean generation block, comprising:

8. The feature extraction and comparison method for video tamper resistance according to claim 6 or 7, wherein DCT transforming the gray-scale average feature image data, and processing the same to obtain sender transformed feature value data, comprises:

9. The method for video tamper-resistant feature extraction and comparison of claim 8, wherein performing DCT on the gray-scale averaged feature image data to obtain transformed gray-scale image data, comprising:

10. The feature extraction and comparison method for video tamper resistance according to claim 9, wherein comparing the transmitting-side transformed feature value data and the receiving-side transformed feature value data to obtain a determination result as to whether the video is tampered, comprises: