WO2016082277A1 - 一种视频认证方法及装置 - Google Patents

一种视频认证方法及装置 Download PDF

Info

Publication number
WO2016082277A1
WO2016082277A1 PCT/CN2014/095369 CN2014095369W WO2016082277A1 WO 2016082277 A1 WO2016082277 A1 WO 2016082277A1 CN 2014095369 W CN2014095369 W CN 2014095369W WO 2016082277 A1 WO2016082277 A1 WO 2016082277A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
hash code
authenticated
video
scene
Prior art date
Application number
PCT/CN2014/095369
Other languages
English (en)
French (fr)
Inventor
孙威
吴金勇
王军
Original Assignee
安科智慧城市技术(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安科智慧城市技术(中国)有限公司 filed Critical 安科智慧城市技术(中国)有限公司
Publication of WO2016082277A1 publication Critical patent/WO2016082277A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2347Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving video stream encryption

Definitions

  • the present invention relates to the field of communication information security technologies, and in particular, to a video authentication method and apparatus.
  • the video is generally transmitted to the user through a common transmission channel.
  • the video may be maliciously tampered with, for example, the video is deleted, or the frame is rearranged, or the content in the frame image is modified.
  • the video involves public safety, political, military, or court evidence, the credibility of the video must be evaluated, resulting in video authentication techniques.
  • the video authentication technology is a key technology in the video information security evaluation and early warning system. The technology generates a discernable identifier by processing the video content, and the video content security is authenticated by the identifier.
  • Current video authentication is mainly implemented by digital watermarking technology or perceptual hashing technology.
  • the content distinguishing feature is obtained by processing the original video, and the distinguishing feature is used to form a video fingerprint describing the video content.
  • the video fingerprint is obtained by the same method for the video to be authenticated corresponding to the original video, and then the authentication of the video to be authenticated is implemented by using a certain similarity comparison algorithm. For example, each frame of the image is randomly divided into blocks, and the blocks overlap each other, and each block is numbered, and the average brightness difference is calculated for the block having the preset relationship, and the structure hash vector and time are generated.
  • the hash sequence determines the structure hash distance by the structure hash vector, and uses the time hash sequence to determine the time hash distance, and weights the structure hash distance and the time hash distance to obtain the hash distance between the original video and the video to be authenticated.
  • the hash distance is compared with the set threshold to determine whether the video has been tampered with.
  • this method can only determine whether the video has been tampered with, and cannot determine the specific location of the tampering.
  • the image with the look and feel is often used as a watermark, and it is embedded into all frames or specific frames of the target video by using a certain embedding algorithm.
  • the embedded watermark is obtained by the extraction algorithm, and the watermark is completely obtained by the pairing. Evaluation of sex and correctness to achieve certification of video content security.
  • the digital watermarking technology embedding method is more complicated, which is not conducive to video authentication.
  • digital watermarking has an impact on the quality of video images, resulting in inefficient video authentication for digital watermarking.
  • a new video authentication method is needed to achieve the purpose of simplifying the algorithm and accurately locating the tampering position without affecting the quality of the video image.
  • the present invention provides a video authentication method, including:
  • the image with the same frame number in the video to be authenticated and the original video is regarded as a group of images to be authenticated, and the hash codes of each corresponding preset region in the group of to-be-authenticated images are matched in units of preset regions; ,
  • the location information determines the location where the video to be authenticated has been tampered with.
  • the generating a hash code to be authenticated for the video to be authenticated includes:
  • each moving target image is regarded as a preset area, and position information of each moving target image in the frame image is generated;
  • the frame image with the difference between the background images being less than or equal to the preset gap is regarded as a set, and for each set, the background image of one frame image is selected from the set to generate a scene image corresponding to each frame image in the set. ;
  • each block is regarded as a preset area, and a block hash code of each block image is generated, and all block hash codes of the scene image are generated.
  • Scene hash code of the scene image For each scene image, the scene image is segmented, each block is regarded as a preset area, and a block hash code of each block image is generated, and all block hash codes of the scene image are generated.
  • the method for generating the hash code to be authenticated according to the scenario hash code, the target hash code, and the location information of the moving target image includes:
  • the target structure of the moving target image of the frame image is cascaded according to the preset spatial position relationship, and the identifier of the scene image corresponding to the frame image is added at the preset position. , generating a frame image hash code;
  • the scene image is segmented for each scene image, each block is regarded as a preset area, and a block hash code of each piece of scene image is generated, by All block hash codes of the scene image generate a scene hash code of the scene image, including:
  • the scene image is divided into blocks; and each block is regarded as a preset area, and perceptual hash feature extraction is performed on each piece of scene image;
  • a preset number of valley bottoms is calculated, and the data range of the feature coefficient is divided into different quantization intervals by the valley bottom;
  • the feature coefficients in each quantization interval are quantized and encoded as a hash code; wherein, at the time of encoding, the number of bits of the different bits of the hash code of the adjacent feature coefficients is less than or equal to the first preset threshold, non-adjacent The number of the different bits of the feature coefficient is greater than the second preset threshold, wherein the first preset threshold is less than or equal to the second preset threshold;
  • a block hash code is composed of a hash code of each scene image, and a scene hash code of the scene image is generated by all block hash codes of one scene image.
  • the image with the same frame number in the video to be authenticated and the original video is regarded as a group of images to be authenticated, in units of preset areas.
  • Matching the hash code of each corresponding preset area in the group of to-be-authenticated images including:
  • the image with the same frame number in the video to be authenticated and the original video is regarded as a group of images to be authenticated, in units of blocks, for each block, the hash code of the block and the block corresponding to the block in the original video
  • the hash code is compared, and it is determined whether the number of different bits in the block hash code in the two blocks exceeds a third preset threshold;
  • the image with the same frame number in the video to be authenticated and the original video is regarded as a group of images to be authenticated, in units of preset areas, Matching the hash code of each corresponding preset area in the group to be authenticated image, including:
  • the image with the same frame number in the video to be authenticated and the original video is regarded as a group of images to be authenticated, and the hash code of the moving target image and the corresponding motion in the original video are calculated for each moving target image in units of moving target images. a difference value of the target hash code of the target image, and calculating a ratio of the difference to the target hash code of the corresponding moving target image in the original video;
  • the calculated ratio is greater than or equal to the fourth preset threshold, it is determined that the moving target image is tampered; and when the calculated ratio is less than the fourth preset threshold, determining that the moving target image is normal.
  • the present invention also provides a video authentication apparatus, the apparatus comprising:
  • the to-be-certified hash code generating module is configured to generate a hash code to be authenticated, and obtain a reference hash code of the original video of the to-be-authenticated video, where the to-be-certified hash code includes: Each frame image, a hash code of each preset area obtained by dividing the image area of the frame image, and position information indicating a position of each preset area in the frame image;
  • the generation algorithm of the Greek code is the same as the generation algorithm of the hash code to be authenticated;
  • the authentication module is configured to treat the image with the same frame number in the video to be authenticated and the original video as a group of images to be authenticated, and hash the corresponding preset region in the group of to-be-authenticated images in units of preset regions. Code matching;
  • a positioning module configured to: for a hash code of each preset area, when the hash code of the preset area does not match the hash code of the corresponding preset area in the original video, determining that the preset area is tampered with, and Determining, according to the location information of the preset area, the location where the video to be authenticated is tampered with.
  • the to-be-certified hash code generating module includes:
  • a processing unit configured to acquire, for each frame image of the video to be authenticated, a background image of the frame image and at least one moving target image formed by pixels of the non-background image of the frame image, and regard each moving target image as a preset area, and generating position information of each moving target image in the frame image;
  • a scene image generating unit configured to treat a frame image whose background image is smaller than or equal to a preset gap as a set, and for each set, select a background image of one frame image from the set to generate each of the set a scene image corresponding to one frame of image;
  • a scene hash code generating unit configured to block the scene image for each scene image, treat each block as a preset area, and generate a block hash code of each piece of the scene image, by the scene All block hash codes of the image generate a scene hash code of the scene image;
  • a target hash code generating unit configured to generate a target hash code of the moving target image for each moving target image
  • the to-be-authenticated hash code generating unit is configured to generate the to-be-authenticated hash code of the to-be-authenticated video according to the scenario hash code, the target hash code, and the location information of the moving target image.
  • the to-be-certified hash code generating unit includes:
  • Determining a subunit configured to determine an identifier of a scene hash code of each scene image; and treating the target hash code of each moving target image and location information of the target image as a target structure;
  • a frame image hash code generating subunit configured to cascade the target structure of the moving target image of the frame image according to a preset spatial position relationship for each frame image of the to-be-authenticated video, and preset Adding an identifier of the scene image corresponding to the frame image to generate a frame image hash code;
  • the to-be-certified hash code generating sub-unit is configured to generate a hash code to be authenticated of the to-be-authenticated video according to a frame image hash code of each frame image according to a timing relationship of the to-be-authenticated video.
  • the scenario hash code generating unit includes:
  • the perceptual hash feature extraction sub-unit is configured to block the scene image for each scene image; and, each block is regarded as a preset area, and perceptual hash feature extraction is performed on each piece of the scene image;
  • a feature coefficient set generation subunit configured to generate a feature coefficient set of each scene image according to the perceptual hash feature extraction result
  • a coding subunit configured to calculate, for each feature coefficient in the feature coefficient set, a preset number of valleys in a data range in which the feature coefficient is located, and divide the data range of the feature coefficient into Different quantization intervals; and the feature coefficients in each quantization interval are quantized and encoded as a hash code; wherein, at the time of encoding, the number of bits of the different bits of the hash code of the adjacent feature coefficients is less than or equal to the first pre- The threshold is set, the number of bits of the different bits of the non-adjacent feature coefficient is greater than the second preset threshold, wherein the first preset threshold is less than or equal to the second preset threshold;
  • the scene hash code generating subunit is configured to compose a block hash code by a hash code of each block scene image, and generate a scene hash code of the scene image by using all block hash codes of a scene image.
  • the authentication module includes:
  • the block matching unit is configured to: when the preset area is a block of the scene image, treat the image with the same frame number in the video to be authenticated and the original video as a group of images to be authenticated, in units of blocks, for each segment. Blocking, comparing the hash code of the block with the block hash code of the corresponding block in the original video, and determining whether the number of bits of different bits in the block hash code in the two blocks exceeds a third preset threshold;
  • the first determining unit is configured to: when the number of bits of the different bits exceeds the third preset threshold, determine that the block is tampered; and when the number of bits of the different bits is less than or equal to the third preset threshold, determine that the block is normal.
  • the authentication module includes:
  • the moving target image matching unit is configured to: when the preset area is a moving target image, treat the image to be authenticated and the same frame number in the original video as a group of images to be authenticated, in units of moving target images, for each motion a target image, calculating a difference between a hash code of the moving target image and a target hash code of the corresponding moving target image in the original video, and calculating a ratio of the difference to the target hash code of the corresponding moving target image in the original video ;
  • the second determining unit is configured to determine that the moving target image is tampered when the calculated ratio is greater than or equal to the fourth preset threshold; and determine that the moving target image is normal when the calculated ratio is less than the fourth preset threshold.
  • the present invention has at least the following beneficial effects: in the video authentication method provided by the embodiment of the present invention, the hash code of the original video and the video to be authenticated is obtained by using the same process and method, and the video is matched by matching the hash codes of the two. Authentication, the method of video authentication with respect to digital watermark is easy to implement, and does not affect the quality of the image. Separating the scene image and the moving target image from the video image and generating a hash code respectively facilitates the determination of the scene image separately.
  • the video authentication method provided by the embodiment of the present invention can accurately locate the falsified location without affecting the image quality of the video.
  • the use of the semi-fragile watermark is less robust to some normal operations (such as smoothing, panning, etc. of the video image), so that the embodiment of the present invention provides The video authentication method can be more generally applied.
  • a hash code matching method with a tolerance interval is used in video authentication, and the robustness can be improved compared with the existing digital watermarking technology. In turn, the accuracy of video authentication is improved.
  • FIG. 1 is an exemplary flowchart of a video authentication method according to an embodiment of the present invention
  • FIG. 2 is a second exemplary flowchart of a video authentication method according to an embodiment of the present invention.
  • FIG. 3 is a third exemplary flowchart of a video authentication method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a multi-interval non-uniform quantization method of a video authentication method according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a manner of organizing a hash code to be authenticated in a video authentication method according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a video authentication apparatus according to an embodiment of the present invention.
  • FIG. 7 is a second schematic diagram of a video authentication apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a video authentication method and device, which is particularly suitable for video authentication of a surveillance video.
  • the video authentication method and apparatus provided by the embodiments of the present invention may be applicable to video of different resolutions and different frame rates, which is not limited by the embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a video authentication method according to an embodiment of the present invention.
  • a hash code of an original video and a video to be authenticated is obtained by using the same algorithm, and By performing video authentication by matching the hash codes of the two, the method of video authentication with respect to the digital watermark is easy to implement, and does not affect the image quality; the scene image and the moving target image are separated from the video image, and respectively generated.
  • the hash code is convenient for judging whether the scene image and the moving target image are falsified separately, and further, the specific position of the background image portion in the frame image of the video to be authenticated may be determined according to the segmentation of the scene image, according to the moving target The position information of the image determines the position where the moving target image is tampered in the frame image of the video to be authenticated.
  • the video authentication method provided by the embodiment of the present invention can affect the image of the video. Quality can also accurately locate the location being tampered with.
  • the robustness of some normal operations for example, smoothing, adding noise, etc.
  • the video authentication methods provided are more generally applicable.
  • a hash code matching method with a tolerance interval is used in video authentication, and the robustness can be improved compared with the existing digital watermarking technology. In turn, the accuracy of video authentication is improved.
  • the video authentication method provided by the embodiment of the present invention is described in detail below.
  • FIG. 2 it is an exemplary flowchart of a method for performing video authentication in an embodiment of the present invention, where the method includes the following steps:
  • Step 201 Generate a hash code to be authenticated for the video to be authenticated, and obtain a reference hash code of the original video of the to-be-authenticated video, where the hash code to be authenticated includes: for each frame image of the video to be authenticated, a hash code of each preset area obtained by dividing the image area of the frame image, and position information indicating a position of each preset area in the frame image; and an algorithm and a method for generating the reference hash code The generation algorithm for the authentication hash code is the same.
  • the scene image is used to represent the feature of the background image of the video, and is divided into a background image and a moving target image in one frame image, the background image is represented by the scene image, and the moving target image is composed of pixels determined to be non-background images, Further, at least one moving target image may be included in one frame of image.
  • Step 202 Treat the image with the same frame number in the video to be authenticated and the original video as a group of images to be authenticated, and perform a hash code for each corresponding preset region in the group of to-be-authenticated images in units of preset regions. match.
  • Step 203 For the hash code of each preset area, when the hash code of the preset area does not match the hash code of the corresponding preset area in the original video, it is determined that the preset area is tampered with, and according to the The location information of the preset area determines the location where the video to be authenticated is tampered with.
  • a hash to be authenticated is generated for the video to be authenticated.
  • the code includes the following steps:
  • Step A1 acquiring, for each frame image of the video to be authenticated, a background image of the frame image and at least one moving target image formed by pixels of the non-background image of the frame image, and treating each moving target image as a pre- A region is set, and position information of each moving target image in the frame image is generated.
  • Step A2 treating a frame image whose background image is smaller than or equal to a preset gap as a set, and for each set, selecting a background image of one frame image from the set to generate an image corresponding to each frame in the set Scene image.
  • Step A3 For each scene image, the scene image is divided into blocks, each block is regarded as a preset area, and a block hash code of each block image is generated, and all block hashes of the scene image are generated. The code generates a scene hash code for the scene image.
  • Step A4 A target hash code of the moving target image is generated for each moving target image.
  • Step A5 Generate a hash code to be authenticated for the video to be authenticated according to the scene hash code, the target hash code, and the location information of the moving target image.
  • the method for separating the scene image and the moving target image in the image by the prior art in the step A1 is applicable to the embodiment of the present invention, which is not limited by the present invention.
  • the background image and the moving target image can be extracted by establishing a background model.
  • the background model When extracting the background image and the moving target image through the background model, how to establish the background model, how to generate the scene image corresponding to each frame image, how to determine which pixel points in the video belong to the moving target image, and how to determine the position of the moving target image Information, the following points are explained in detail below:
  • the background model can be established by multi-sample modeling, and the scene image can be updated.
  • the method of establishing a background model using multiple samples is as follows:
  • the background model is initialized in the first frame image of the video to be authenticated, and each pixel in the first frame randomly extracts N samples in its preset neighborhood to form an initial background model.
  • Each pixel point in the background model corresponds to a sample model, and each sample model is composed of samples of a preset number of samples, Each sample corresponds to one sample value.
  • the above sample model can be represented by the formula (1):
  • Model (i,j) ⁇ sample 1 ,sample 2 ,...,sample N ⁇ (1)
  • Model (i, j) represents the sample model of the (i, j) point
  • sample represents the sample value in the sample model
  • N is the number of samples.
  • the sample value of each sample can be represented by the pixel value of the sample.
  • the method for extracting the moving target image may be performed as: for each pixel in each frame of the image to be authenticated, the pixel value of the pixel and the current background Comparing each sample value in the sample model with the same position of the pixel in the model, and calculating the sample difference between the pixel value of the pixel and each sample value, then N sample gaps can be obtained, where N is each The number of samples in a sample model.
  • the sample gap is smaller than the sample threshold, the pixel is determined to be the background point (ie, the pixel belongs to the scene image), otherwise, the pixel is determined to be the pixel of the moving target image.
  • the sample difference between the pixel value of the pixel and the sample value can be determined by formula (2):
  • diff represents the sample gap
  • value (i, j) represents the pixel value of the pixel at position (i, j)
  • sample i represents the sample model in the position (i, j) Sample value.
  • the difference between the backgrounds of the multi-frame images in the video to be authenticated may be relatively large.
  • the background image of each frame of the image to be authenticated cannot be accurately described by a fixed background model, and such a background model cannot be accurately Extract the moving target image. Therefore, in order to accurately describe the features of the background image of each frame of image so as to be able to accurately extract the moving target image, it is necessary to update the background model and extract the moving target image using the updated background model.
  • the specific updating method may be: starting from the image of the second frame, for each pixel in each frame of the image, when determining that the pixel does not belong to the moving target image (ie, the background point of the background image)
  • the pixel point can be updated.
  • a random sample value in a sample model at a random location within a predetermined neighborhood For example, when the preset neighborhood is 8 neighborhoods, a pixel can be in a maximum of 8 preset fields, and a preset neighborhood can be randomly selected from the 8 preset neighborhoods, and the selected preset is selected.
  • a sample model of a position is randomly selected in the neighborhood, and a sample value is randomly selected from the selected sample model for updating.
  • can be determined empirically, for example, ⁇ can be selected to be 16. It should be noted that any method for updating the scene image by using the prior art is applicable to the embodiment of the present invention, which is not limited herein.
  • the extraction process of the moving target image can be described as: for the first frame image of the video to be authenticated, the moving target image is extracted according to the initialized background model, and in the subsequent moving target image extraction process, the background model is updated according to the background point. After the background model is updated, the moving target image is continuously extracted according to the updated background model, and the background model is updated while the moving target image is extracted. After the background model is updated, the updated target model is used to extract the moving target image.
  • a multi-frame image can share one scene image.
  • the frame image whose gap between the background images is less than or equal to the preset gap is regarded as a set, and for each set, the background image of one frame image is selected from the set.
  • the scene image corresponding to each frame image in the set is generated.
  • the video to be authenticated has a total of 100 frames, which are respectively labeled as 1-100, wherein 30 images of the first frame image to the 30th frame image The difference between the images is less than or equal to the preset gap, so the images from the first frame to the 30th frame are a set, and the background image of one frame of images is selected from the set to generate a scene image of each frame of the set.
  • each frame image can be represented by a scene image and a moving target image. Since the multi-frame image shares a scene image, the generated video is generated later.
  • the background image of the multi-frame image is replaced by the scene image, the length of the hash code can be reduced, and the efficiency of hash code matching is improved, thereby improving the efficiency of video authentication.
  • the generation of the scene image may include two processes of establishing and updating the scene image, as follows:
  • the moving target image of the first frame image may be extracted according to the background model, and the background image of the first frame image is formed by the pixels not belonging to the moving target image.
  • the background image of the first frame image is generated as an initial scene image. Then enter the processing flow of the next frame image.
  • the motion target image in the frame image, and the background image are extracted using the background model, and the background model is continuously updated with the pixel points of the background image.
  • the result of the comparison does not satisfy the condition for updating the scene image (ie, the difference between the background images is less than or equal to the preset gap)
  • the initial scene image is used as the scene image corresponding to the second frame image; when the comparison result satisfies the condition of updating the scene image (that is, when the difference between the background images is greater than the preset gap), the background image of the second frame image is generated.
  • the scene image is used as the updated scene image, and the updated scene image is used as the scene image corresponding to the second frame image. Then enter the processing flow of the next frame image.
  • the method for processing the frame of the third frame to the last frame of the video to be authenticated is the same as the method for processing the image of the second frame, and details are not described herein again.
  • the current frame image when the difference between the background image of the frame image and the current scene image is less than or equal to the preset gap, that is, between the background image of the current frame image and the background image of the previous frame image. If the difference is less than or equal to the preset gap, the current frame image is divided into the set of images of the previous frame.
  • each frame map is obtained.
  • any background image in a set may be selected to be used as a scene image for generating the set, which is not limited in the present invention.
  • generating the scene image according to the background image may be performed: in one frame image, filling pixel values of pixels other than the pixel points of the background image according to a preset rule.
  • the preset rule is, for example, using a default pixel value, or selecting a pixel value from a corresponding position in the background model. It should be noted that when a pixel value of a preset position is selected from a corresponding position in the background model, in order to reduce a gap caused by randomly selecting pixel values, the difference is, for example, a corresponding scene image of the video to be authenticated and the original video. The gap between the two.
  • the scene image is updated to realize that the multi-frame image corresponds to one scene image.
  • the current scene image may be updated by calculating a frame difference between the scene image generated by the currently extracted background image and the current scene image.
  • the frame difference can be represented by a ratio. The ratio is obtained by first calculating the pixel value difference between the scene image generated by the background image extracted by the current frame image and the corresponding position in the current scene image, and the pixel value difference exceeds the preset frame. The number of pixels of the difference gap is divided by the total number of pixels of the entire frame image, and the frame difference is represented by the finally obtained ratio.
  • the frame difference is greater than the preset frame difference threshold, the current scene image is updated.
  • the frame difference can be calculated by equation (3):
  • ratio represents a frame difference
  • NUM change represents a number of pixels in which the pixel value difference exceeds a preset frame difference
  • NUM total represents the total number of pixels of the entire frame image.
  • Manner 1 Calculate the color difference between the scene image generated according to the extracted background image and the pixel position of the corresponding position in the current scene image, and calculate the average color difference between the two images, when the average color difference is greater than the color
  • the threshold is poor, the scene image is updated, and when the average color difference is less than or equal to the color difference threshold, the scene image is not updated.
  • Manner 2 calculating the color difference between the scene image generated according to the extracted background image and the pixel position of the corresponding position in the current scene image, and calculating the number of pixels with the color difference greater than the color difference threshold, and the number of pixels when the color difference is greater than the color difference threshold
  • the scene image is updated when the ratio of the number of pixels of the entire frame image is greater than the preset ratio, otherwise the scene image is not updated.
  • any method capable of indicating the difference between the images in order to determine whether to update the scene image is applicable to the embodiment of the present invention, which is not limited by the embodiment of the present invention.
  • the accuracy of the background image description background image can be improved under the premise of ensuring that the multi-frame image shares a scene image.
  • each moving target image is a set of pixel points composed of at least one pixel, which can be based on the frame image of the pixel of the moving target image.
  • the position in the position determines the position information of the moving target image.
  • the moving target image such as a rectangle, a square, a circle, or the like, may be described in a minimum figure capable of including a moving target image.
  • the position information of the graphic in one frame image is used to represent the position information of the moving target image in the video to be authenticated.
  • positional information in the frame image of the top left corner of the rectangle, and the length and width of the rectangle may be used to represent the positional information of the moving target image.
  • the formula may be used ( 4) indicates a moving target image:
  • rect i represents the i-th moving target image
  • (x i , y i ) represents the coordinates of the vertices of the upper left corner of the rectangle (the position representation in the frame image of the moving target image that can be represented by the rectangle) );
  • width i represents the width of the rectangle;
  • height i represents the height of the rectangle.
  • the moving target image may be labeled in units of each frame image.
  • the value of n in the formula (4) may be different according to the frame image; or the whole segment of the video to be authenticated may be used as a unit.
  • the moving target image is numbered, and the value of n in the formula (4) at this time is a fixed value for a certain video to be authenticated.
  • the method for indicating the position of the moving target image in the prior art is applicable to the embodiment of the present invention, which is not limited by the present invention.
  • the following may respectively generate a scene hash code representing the scene image and a target hash code of the moving target image, and specifically, a process of generating a scene hash code of the scene image, Can include the following:
  • step A3 may be performed as follows:
  • Step A31 Blocking the scene image for each scene image; and treating each block as a preset area, performing perceptual hash feature extraction for each piece of scene image.
  • the scene image may be evenly divided into blocks, or may be non-uniformly divided.
  • the non-uniform partitioning is, for example, when image information having a feature describing the scene image in the scene image is concentrated in a part of the area, the partial area may be divided into a plurality of small blocks, and the feature change is not much, for example, a whole color.
  • the partial area is divided into a larger block.
  • the perceptual hash is a kind of one-way mapping from the multimedia data set to the perceptual digest set, that is, the multimedia digital representation with the same perceptual content is uniquely mapped into a piece of digital digest, and the perceptual robustness and security are satisfied.
  • the data volume of the original image data set can be significantly reduced, and the perceptual digest set represents the characteristics of the original image.
  • the perceptual hash feature extraction method of the scene image needs to extract finer features as much as possible. Therefore, the transform domain feature extraction method may be adopted.
  • perceptual hash feature extraction of the scene image which may be a block DCT transform (Discrete Cosine Transform), a block DWT transform (Discrete Wavelet Transform), a block DFT transform (Discrete Fourier Transform) , discrete Fourier transform) and so on. Any method that can extract a fine perceptual hash feature is applicable to the embodiment of the present invention, which is not limited thereto.
  • Step A32 Generate a feature coefficient set of each scene image according to the perceptual hash feature extraction result.
  • a DCT coefficient matrix of the partition can be obtained, and a DC coefficient (DC coefficient) of a preset first coefficient number and a DC coefficient are extracted from the coefficient matrix.
  • Presetting the AC coefficient (AC coefficient) of the second coefficient number as the feature coefficient in the feature coefficient matrix of the block preferably, the feature of the scene image can be effectively characterized, and the coefficients in the DCT coefficient matrix are followed. The order is sorted in order of large to small, and when a coefficient of the number of preset coefficients (including the number of first coefficients or the number of second coefficients) is obtained, a predetermined number of coefficients of the top ranking are obtained.
  • a preset number of AC coefficients may be selected according to the position of the AC coefficient in the DCT coefficient matrix. For example, the AC coefficient of the position where the sum of the row number and the column number of the position in the matrix is smaller than the preset sum value may be selected.
  • the matrix of feature coefficients of all the blocks in the scene image constitutes a set of feature coefficients of the scene image.
  • the feature coefficient matrix of each block can be represented by formula (5)
  • the feature coefficient set of each scene image can be represented by formula (6):
  • a matrix of feature coefficients representing the i-th block Y represents a feature coefficient in the matrix of feature coefficients; n 1 represents the number of first coefficients; and n 2 represents the number of second coefficients.
  • formula (6) is:
  • FEATURE represents a set of feature coefficients; a matrix of feature coefficients representing a block of 1 in the scene image, a matrix of feature coefficients representing a block of 2 in the scene image; A feature coefficient matrix representing a block of N in the scene image; N is the number of scene image blocks.
  • Step A33 For each characteristic coefficient in the feature coefficient set, calculate a preset number of valleys in the data range in which the feature coefficients are located, and divide the data range of the feature coefficients into different quantization intervals by the valley bottom; The feature coefficients in each quantization interval are quantized and encoded as a hash code; wherein, at the time of encoding, the number of bits of the different bits of the hash code of the adjacent feature coefficients is less than or equal to the first preset threshold, and the non-phase The number of bits of different bits of the adjacent feature coefficient is greater than a second predetermined threshold, The first preset threshold is less than or equal to the second preset threshold.
  • Step A34 A block hash code is composed of hash codes of each scene image, and a scene hash code of the scene image is generated by all block hash codes of one scene image.
  • each element in the feature coefficient matrix of each block is a feature coefficient, for example, continuing to use the previous example, if each block image is DCT transformed, 5 AC coefficients are obtained, For one DC coefficient, the type of the characteristic coefficient is 6 (ie, 5+1).
  • the feature coefficient set is formed by the formula (6), each row of the feature coefficients in the feature coefficient set is composed of N feature coefficients from different partitions, and each row of feature coefficients is all members of one feature coefficient.
  • the feature coefficient set after obtaining the feature coefficient set of the scene image, in order to reduce the redundant information in the feature coefficient set and reduce the transmission cost, the feature coefficient set needs to be quantized.
  • the quantization method ensures that the redundancy information is reduced while ensuring the robustness and sensitivity of the video authentication result.
  • the feature coefficient set is quantized and encoded by a multi-interval non-uniform quantization method.
  • each block image is subjected to DCT transform to obtain 5 AC coefficients, 1 DC coefficient, and each row of the feature coefficient set is a feature coefficient, and the quantization of each feature coefficient can be performed as The following operations:
  • Step A33-1 Calculate a data histogram of each characteristic data of the feature coefficient according to a data range of the row characteristic coefficient and a preset rule.
  • the data range determined by the maximum value and the minimum value of each line of the feature coefficients is divided into preset intervals, and the preset number of intervals is used for performing histogram statistics to calculate the The number of characteristic coefficients.
  • Step A33-2 Calculate the valley bottom from the interval in which the median value of each line of the feature coefficient is located, and calculate the valley bottom of the histogram to both sides.
  • the method for calculating the bottom is calculated from the interval in which the median is located to both sides. If the number of feature coefficients included in the current interval is smaller than the number of feature coefficients included in the adjacent two regions, the current interval is recorded as the bottom. . If the number of preset quantization intervals is m, then you need to find m-1 valleys. Preferably, in order to satisfy the symmetry requirement, the number m of quantization intervals may be an odd number.
  • Step A33-3 Using the valley bottom as the threshold value of the quantization interval, for the feature system in each quantization interval The number is quantized.
  • the number of bits of the different bits of the hash code of the adjacent feature coefficients is less than or equal to the first preset threshold, and the number of bits of the different bits of the non-adjacent feature coefficient is greater than the second preset threshold.
  • the first preset threshold is less than or equal to the second preset threshold. The farther the distance between the two characteristic coefficients is, the more bits of the different bits of the hash codes of the two characteristic coefficients are, so that the difference of the hash codes of the corresponding feature coefficients can be determined according to the number of different bits of the hash code. .
  • the feature coefficient set is quantized.
  • the hash code of each scene image may be in a preset order (the preset order is, for example, in a spatial order from top to bottom, by left).
  • the sequence to the right is connected end to end, and the scene hash code of the scene image can be obtained. It should be noted that any method for sequentially connecting the hash codes of the blocks is applicable to the embodiment of the present invention, which is not limited thereto.
  • the process of generating the target hash code of the moving target image may include the following contents:
  • the process of generating the target hash code of the moving target image also includes two parts: the perceptual hash feature extraction and the feature quantization coding, wherein the hash code generation method of the moving target image can be generated with the hash code of the scene image.
  • the method is the same or different.
  • the ratio of the moving target image to the entire frame image is greater than or equal to the preset ratio, that is, when the moving target image occupies a larger portion of the entire frame image, the same hash code generating method as the scene image may be adopted.
  • the proportion of the moving target image to the entire frame image is less than the preset ratio, that is, when the moving target image occupies a small portion of the entire frame image, since the feature of the moving target image itself is relatively obvious, only the rough overall feature needs to be extracted.
  • the overall Hu moment can be extracted to roughly represent the overall feature of the moving target image.
  • seven Hu moments of the moving target image are obtained as the feature coefficients of the moving target image.
  • the quantization of the seven feature coefficients is realized by the rounding operation, which is convenient for realizing and quickly generating a hash code of the moving target image, thereby improving the speed of video authentication.
  • the method for realizing the quantization of the feature coefficient matrix of the moving target image in the prior art is applicable to the embodiment of the present invention, which is not limited by the embodiment of the present invention.
  • the hash code to be authenticated can be generated, specifically:
  • step A5 can be performed as follows:
  • Step A51 Determine the identifier of the scene hash code of each scene image; and regard the target hash code of each moving target image and the position information of the target image as one target structure.
  • pointing to the scene hash code by pointer means can be regarded as a method for determining the identifier of the scene hash code.
  • Step A52 Casing the target structure of the moving target image of the frame image according to the preset spatial position relationship for each frame image of the video to be authenticated, and adding the scene image corresponding to the frame image at the preset position. Identification, generating a frame image hash code.
  • each frame image the spatial position of each target structure in the frame image according to its corresponding moving target image in the frame image is in order from top to bottom, left to right.
  • Each target structure is cascaded, and then the scene hash code of the scene image corresponding to the frame image is pointed by a pointer after the last target structure, thereby finally generating a frame image hash code.
  • the multi-frame image shares the scene hash code of one scene image.
  • Step A53 Cascading the frame image hash code of each frame image according to the timing relationship of the video to be authenticated, and generating a hash code to be authenticated for the video to be authenticated.
  • the frame image hash code is measured in chronological order in the video, thereby generating a final hash code to be authenticated.
  • a scene image is shared by a multi-frame image, thereby reducing the length of the hash code of the entire video to be authenticated by reducing the number of hash codes of the scene image, thereby performing hashing when performing video authentication.
  • the length of the code is reduced, which can reduce the comparison length of the hash code, thereby improving the speed of video authentication.
  • the video to be authenticated when the video to be authenticated is transmitted to the user through a transmission channel (for example, a network), the video information of the video to be authenticated is inevitably changed in different degrees during the transmission, and the change is made. Will have a certain impact on the video authentication results, so in order to drop To reduce the impact of this change on video authentication, before the step A1, the authentication video can be smoothed. Specifically, since the human eye is sensitive to the color information of the video image, the sensitivity of the human eye to the luminance information is much greater than the hue information and the saturation information in the color information, so when smoothing the video image The brightness information of the video to be authenticated can be extracted, and only the brightness information is smoothed.
  • a transmission channel for example, a network
  • the Y information of the YUV color space can be smoothed, where "Y” represents brightness (Luminance or Luma), that is, gray scale value; and "U” and “V” represent chroma (Chrominance or Chroma). ), the role is to describe the image color and saturation, used to specify the color of the pixel. If the color space used by the video image is not the YUV color space, the color space of the video image can be converted to the YUV color space according to the prior art, and the brightness smoothing process can be performed. It should be noted that other color information may be smoothed to reduce the impact of changes in the above transmission process on video authentication, and other prior art methods may be used to reduce the change in the above transmission process. The purpose of the impact of the video authentication is not limited in this embodiment of the present invention.
  • the smoothing processing method may be performed by Gaussian smoothing, and other smoothing processing methods provided in the prior art may be used, which is not limited in the embodiment of the present invention.
  • the frame image hash code of each frame image can be matched in units of frame images, and when matching the frame image hash code of each frame image, it can be divided into the scene hash code matching of the scene image, and the target of the moving target image.
  • Hash code matching is a detailed description of the video authentication process, that is, the matching process of the hash code to be authenticated and the reference hash code of the original video.
  • the frame image hash code of each frame image can be matched in units of frame images, and when matching the frame image hash code of each frame image, it can be divided into the scene hash code matching of the scene image, and the target of the moving target image.
  • step 202 may be performed as follows: treating the image to be authenticated with the same frame number in the original video as a group of images to be authenticated, in units of blocks, for each Blocking, comparing the hash code of the block with the block hash code of the corresponding block in the original video, and determining whether the number of different bits in the block hash code in the two blocks exceeds a third preset threshold When the number of bits of the different bits exceeds the third preset threshold, it is determined that the block is tampered; when the number of bits of the different bits is less than or equal to the third preset threshold, it is determined that the block is normal.
  • the third preset threshold may be the same as the foregoing first preset threshold.
  • the first preset threshold is a bit that requires different bits of the hash code between adjacent coefficients when encoding the scene image. The number is smaller than the first preset threshold.
  • the third preset threshold may also be different from the foregoing first preset threshold, and the third preset threshold may be based on experience or user requirements. set up.
  • the third preset threshold is the same as the first preset threshold, when it is determined that the number of bits of the different bits exceeds the third preset threshold, the number of different bits between adjacent feature coefficients is less according to the foregoing requirement, and the non-adjacent features If the number of different bits in the coefficient is large, it may be determined that the feature coefficient corresponding to the hash code jumps from the entire adjacent quantization interval to the non-adjacent quantization interval from one quantization interval. That is, the change of the characteristic coefficient causes the position change to be greater than the preset entire quantization interval, that is, beyond the entire tolerance range, so that the feature coefficient is determined to change, that is, the block corresponding to the feature coefficient changes, thereby determining The block was tampered with.
  • the third preset threshold when the number of bits of different bits is less than or equal to the third preset threshold, it indicates that the change of the feature coefficient has not crossed the preset tolerance interval, that is, within the tolerance range, thereby determining that the feature coefficient has not changed. That is, it is determined that the block corresponding to the feature coefficient does not change. Therefore, for a block, there are multiple feature coefficients in the feature coefficient matrix, when there is a hash code of at least one feature coefficient and a bit number of a different bit of the hash code of the corresponding preset coefficient of the corresponding block in the original video.
  • the third preset threshold is exceeded, it indicates that the block is tampered with, otherwise, it indicates that the block is normal.
  • the video authentication method used in the embodiment of the present invention can be improved by the tolerant interval and the multi-interval non-uniform quantization method described above, and the video authentication method is used in the video authentication method in the prior art. It is robust to some normal operations, thereby improving the accuracy of video authentication.
  • the first case when the ratio of the entire frame image of the moving target image is greater than or equal to the preset ratio, that is, when the hash code generating method of the moving target image is the same as the hash code generating method of the scene image, the moving target The hash code matching method of the image and the hash code matching method of the scene image The same is not repeated here.
  • the second case when the proportion of the entire frame image occupied by the moving target image is smaller than the preset ratio, that is, when the rough overall feature of the moving target image is extracted in units of the moving target image, the moving target image may be used as a unit.
  • the hash code is matched. Specifically, for each moving target image, a difference between the hash code of the moving target image and the target hash code of the corresponding moving target image in the original video is calculated, and the difference is occupied in the original video.
  • the calculated rate of change ie, the ratio
  • a fourth predetermined threshold it is determined that the characteristic coefficient changes.
  • the proportion of the entire frame image is small, that is, the moving target image is small, so the tampering operation has a great influence on it. Therefore, for a hash code of a plurality of feature coefficients in the target hash code, when a hash code having at least one feature coefficient changes, it is determined that the moving target image is falsified, and when there is a change calculated by the formula (7) rate:
  • ChangeRatio i represents a rate of change of the i-th hash code of the moving image; a hash code representing an i-th feature coefficient of the corresponding moving target image in the original video; A hash code indicating the i-th feature coefficient of the moving target image in the video to be authenticated.
  • the value of the fourth preset threshold may be determined according to actual requirements.
  • each block can be calculated in units of blocks.
  • the rate of change of the characteristic coefficient when there is a rate of change of at least one coefficient exceeding a fourth predetermined threshold, determining that the block is tampered with.
  • the hash code of the original video and the video to be authenticated is obtained by using the same process and method, and the video is authenticated by the hash code, and the video is recognized relative to the digital watermark.
  • the method of the certificate is easy to implement and does not affect the quality of the image.
  • the specific position where the scene image is tampered may be determined according to the block of the scene image, and the position where the moving target image is tampered is determined according to the position information of the moving target image.
  • the hash code comparison method with tolerance interval is used in video authentication, which can improve the robustness and improve the accuracy of video authentication compared with the existing digital watermarking technology. .
  • the video authentication of the video of the surveillance video is taken as an example. Since the scene image of the surveillance video is relatively stable, at least one scene image may be established to represent the scene image of the surveillance video. Corresponding to the scene image of the surveillance video, the perceptual hash feature extraction of each block may be performed by pre-blocking and DCT transforming each block, and each block is composed of one DC coefficient and five AC coefficients. The coefficient coefficient matrix is then obtained by multi-interval non-uniform quantization to obtain a block hash code for each block.
  • each of the target images may be used as a unit, and the seven Hu moments to be obtained for each target image are used as the feature coefficient matrix of the moving target image, and then The hash code of the moving target image is obtained by the forensic operation, and finally, the block hash code of the scene image is matched by the tolerance interval method in units of each block, and the moving target image is matched by using the transform rate.
  • the method includes the following steps:
  • Step 301 Initialize the background model in the first frame image of the video to be authenticated.
  • the method for initializing the background model may be performed by randomly extracting 20 samples in each of the 8 neighborhoods of the first frame to form an initial background model.
  • Step 302 Extract a moving target image and a background image from the first frame image of the video to be authenticated according to the initialized background model, generate position information of each moving target image, and generate a current scene image according to the background image, and generate the current scene image.
  • each pixel in the first frame image it is determined by formula (2) whether the pixel belongs to the background point (ie, belongs to the background model), specifically, the pixel point and the corresponding position in the background model
  • Each sample value in the sample model of the pixel is compared to obtain a sample gap.
  • the obtained sample gap is smaller than the sample threshold, the number of pixels is determined to be the background point. Otherwise, The pixel is a pixel of the moving target image.
  • the positional information of the moving target image is represented by the upper left corner vertex of the smallest rectangle which can include the moving target image and the width and height of the rectangle according to the aforementioned formula (4).
  • Step 303 Update the background model from the second frame image of the video to be authenticated, and extract the moving target image and the background image using the updated background model for each frame image.
  • Step 304 Starting from the second frame image of the video to be authenticated, determining, for each frame image, whether the current scene image is updated according to the extracted background image, and when determining that the current scene image is updated, using the updated scene image as the frame The scene image corresponding to the image, when it is determined that the current scene image is not updated, uses the current scene image as the scene image corresponding to the frame image.
  • the scene image of the monitoring video is relatively stable, and the change is not obvious in a short time.
  • the current scene image may be updated by the method of calculating the frame difference or the chromatic aberration described in the first embodiment, and details are not described herein again.
  • Step 305 Block the scene image for each scene image, and generate a block hash code of each scene image, and generate a scene hash code of the scene image from all block hash codes of the scene image.
  • step 305 can be performed as follows:
  • Step C1 For each scene image, the scene image is divided into blocks, each block is a block; and DCT transform is performed on each block, and one DC coefficient is selected from the DCT coefficient matrix after DCT transformation.
  • Five larger AC coefficients are used as the feature coefficients of the block.
  • the AC coefficient of the position where the sum of the row number and the column number is less than 5 can be selected.
  • the line number is the first line and the column number is the first column
  • the sum of the line number and the column number is 2, which is smaller than the preset sum.
  • the value is 5, so the AC coefficient at the first column position of the first row is used as the selection factor.
  • the AC coefficients in the DCT coefficient matrix are arranged in descending order, and the five AC coefficients ranked first are selected.
  • Step C2 For each scene image, the feature coefficients of all the blocks of the scene image constitute a feature coefficient set of the scene image, wherein in the feature coefficient set, the feature coefficients of each line are from different blocks. Each line of feature coefficients is all members of one feature coefficient.
  • Step C3 For each row of feature coefficients in the feature coefficient set, within the data range in which the row feature coefficients are located, divide the row feature coefficients into 500 intervals, and start from the interval in which the median value of the row feature coefficients is located to both sides Four valley bottoms are calculated, and the data range of the feature coefficients is divided into different quantization intervals by the bottom; the feature coefficients in each quantization interval are quantized and encoded as hash codes; wherein, at the time of encoding, The number of bits of different bits of the hash code of the adjacent feature coefficient is less than or equal to 1, and the number of bits of different bits of the non-adjacent feature coefficient is greater than 1.
  • Multi-interval non-uniform quantization of each partition in the scene image is achieved by step C3.
  • each of the feature coefficients can be represented by a 3-bit binary code.
  • each partition consists of six 3-bit binary codes that make up the block hash code for that block.
  • FIG. 4 is a schematic diagram of a multi-interval non-uniform quantization method when the number of preset valleys is 4 in the embodiment of the present invention.
  • FIG. 4 there are two ranges of feature coefficients in a set of feature coefficients. Two black solid dots are represented at the end; each Interval represents a trough, and four fixed segments divide the row characteristic coefficients into five quantization intervals, and the quantization result of each quantization interval is a corresponding binary code above each quantization interval.
  • Step 306 Extracting, according to the moving target image, seven Hu moments of the moving target image as the feature coefficients of the moving target image, and performing a rounding operation on the feature coefficients of the moving target image. Quantization is performed to generate a target hash code of the moving target image.
  • Step 307 For each moving target image, the target hash code of the moving target image and the position information of the target image are regarded as one target structure.
  • steps 305-307 is not limited.
  • Step 308 Casing the target structure of the moving target image of the frame image according to the preset spatial position relationship for each frame image of the video to be authenticated, and pointing the frame to the frame by the last target structure. A scene hash code corresponding to the image, thereby generating a frame image hash code.
  • Step 309 The frame image hash code of each frame image is sequentially concatenated according to the timing relationship of each frame of the image in the to-be-authenticated video, and the to-be-authenticated hash code of the to-be-authenticated video is generated.
  • each SceneHash represents a scene hash code of a scene image; each frame represents a frame image; each frame image Each Rect represents a moving target image in the frame image, and each MotObjHash represents a target hash code corresponding to the moving target image.
  • the last moving target image Rect3 points to the scene hash code of the scene image SceneHash1 in a pointer manner.
  • Each frame image is cascaded to generate a hash code to be authenticated for the video to be authenticated in a timing relationship in the video to be authenticated.
  • Step 310 Acquire a reference hash code of the original video, where the algorithm for generating the reference hash code is the same as the algorithm for generating the hash code to be authenticated.
  • Step 311 Match the authentication hash code and the reference hash code, and authenticate the authentication video.
  • the hash code of the block is compared with the block hash code of the corresponding block in the original video, and the different bits are determined. Whether the number of bits exceeds one; when the number of bits of different bits exceeds one, it is determined that the block is tampered; when the number of bits of different bits is less than or equal to 1, it is determined that the block is normal.
  • each block has 6 characteristic coefficients, and each feature coefficient is represented by a 3-bit binary code.
  • different bits of the hash code of each feature coefficient can be determined by formula (8). Number of digits:
  • DiffBit represents the number of bits of different bits
  • Dbit i represents the i-th bit of the binary code of the feature coefficient in the video to be authenticated
  • Obit i represents the i-th bit of the corresponding feature coefficient of the corresponding block in the original video.
  • each feature coefficient is represented by a k-bit binary code
  • 3 in the formula (8) can be replaced with k to calculate the number of bits of different bits of the hash code of each feature coefficient.
  • the hash code of the moving target image and the corresponding moving target image in the original video are calculated for each moving target image of the video to be authenticated in units of the moving target image.
  • a target hash code difference value and calculating a ratio of the difference to the target hash code of the corresponding moving target image in the original video; when the calculated ratio is greater than or equal to the fourth preset threshold, determining that the moving target image is tampered with When the calculated ratio is less than the fourth preset threshold, it is determined that the moving target image is normal.
  • the scene image and the moving target image are separated, corresponding to the scene image with rich features, fine perceptual hash feature extraction, and multi-interval non-uniform quantization coding to generate a scene image scene.
  • the hash code and the hash code comparison method with the tolerance interval are used for the authentication, which can improve the robustness of the video authentication, thereby improving the accuracy of the video authentication, and the video authentication method provided by the embodiment of the present invention can be more generally applied.
  • a rough perceptual hash feature extraction is performed, and the efficiency of extracting the perceptual hash feature and the quantization coding can be improved without affecting the authentication result.
  • the multi-frame image By sharing one scene image (ie, a scene image) by the multi-frame image, when the hash code to be authenticated is generated, the length of the hash code is reduced, and when the video authentication is performed, the hash code matching is reduced. Quantity, which increases the efficiency of video authentication. Further, by dividing the block and the position information of the moving target image, when it is determined that the authentication is to be tampered with, the position of the tampering can be accurately determined.
  • a scene image ie, a scene image
  • FIG. 6 it is a schematic diagram of the device.
  • the device includes:
  • the to-be-certified hash code generating module 601 is configured to generate a hash code to be authenticated, and obtain a reference hash code of the original video of the to-be-authenticated video, where the to-be-certified hash code includes: Each frame of the video, a hash code of each preset area obtained by dividing the image area of the frame image, and position information indicating a position of each preset area in the frame image; the reference
  • the hash code generation algorithm is the same as the generation algorithm of the hash code to be authenticated;
  • the authentication module 602 is configured to treat the image to be authenticated and the image with the same frame number in the original video as a group of images to be authenticated, and each corresponding preset region in the group of to-be-authenticated images is in a preset area.
  • the hash code is matched;
  • the locating module 603 is configured to determine, for the hash code of each preset area, that the hash code of the preset area does not match the hash code of the corresponding preset area in the original video, and determine that the preset area is tampered with. And determining, according to the location information of the preset area, the location where the video to be authenticated is tampered with.
  • the to-be-certified hash code generating module 601 includes:
  • the processing unit 604 is configured to acquire, for each frame image of the video to be authenticated, a background image of the frame image and at least one moving target image formed by pixels of the non-background image of the frame image, and view each moving target image a preset area, and generating position information of each moving target image in the frame image;
  • the scene image generating unit 605 is configured to treat, as a set, a frame image whose difference between the background images is less than or equal to a preset gap, and select, for each set, a background image of one frame image from the set to generate the image in the set. a scene image corresponding to each frame image;
  • the scene hash code generating unit 606 is configured to block the scene image for each scene image, treat each block as a preset area, and generate a block hash code of each piece of the scene image, by using All block hash codes of the scene image generate a scene hash code of the scene image;
  • a target hash code generating unit 607 configured to generate, for each moving target image, a target hash code of the moving target image
  • the to-be-certified hash code generating unit 608 is configured to generate the to-be-authenticated hash code of the to-be-authenticated video according to the scenario hash code, the target hash code, and the location information of the moving target image.
  • the to-be-certified hash code generating unit includes:
  • Determining a subunit configured to determine an identifier of a scene hash code of each scene image; and treating the target hash code of each moving target image and location information of the target image as a target structure;
  • a frame image hash code generating subunit configured to cascade the target structure of the moving target image of the frame image according to a preset spatial position relationship for each frame image of the to-be-authenticated video, and preset Adding an identifier of the scene image corresponding to the frame image to generate a frame image hash code;
  • the to-be-certified hash code generating sub-unit is configured to generate a hash code to be authenticated of the to-be-authenticated video according to a frame image hash code of each frame image according to a timing relationship of the to-be-authenticated video.
  • the scenario hash code generating unit includes:
  • the perceptual hash feature extraction sub-unit is configured to block the scene image for each scene image; and, each block is regarded as a preset area, and perceptual hash feature extraction is performed on each piece of the scene image;
  • a feature coefficient set generation subunit configured to generate a feature coefficient set of each scene image according to the perceptual hash feature extraction result
  • a coding subunit configured to calculate, for each feature coefficient in the feature coefficient set, a preset number of valleys in a data range in which the feature coefficient is located, and divide the data range of the feature coefficient into Different quantization intervals; and the feature coefficients in each quantization interval are quantized and encoded as a hash code; wherein, at the time of encoding, the number of bits of the different bits of the hash code of the adjacent feature coefficients is less than or equal to the first pre- The threshold is set, the number of bits of the different bits of the non-adjacent feature coefficient is greater than the second preset threshold, wherein the first preset threshold is less than or equal to the second preset threshold;
  • the scene hash code generating subunit is configured to compose a block hash code by a hash code of each block scene image, and generate a scene hash code of the scene image by using all block hash codes of a scene image.
  • the authentication module 602 includes:
  • the block matching unit 609 is configured to: when the preset area is a block of the scene image, treat the image to be authenticated and the same frame number in the original video as a group of images to be authenticated, in units of blocks, for each Blocking, comparing the hash code of the block with the block hash code of the corresponding block in the original video, and determining whether the number of different bits in the block hash code in the two blocks exceeds a third preset threshold ;
  • the first determining unit 610 is configured to: when the number of bits of the different bits exceeds the third preset threshold, determine that the block is tampered; and when the number of bits of the different bits is less than or equal to the third preset threshold, determine that the block is normal.
  • the authentication module 602 includes:
  • the moving target image matching unit 611 is configured to: when the preset area is a moving target image, treat the image to be authenticated and the same frame number in the original video as a group of images to be authenticated, in units of moving target images, for each Moving target image, calculating a hash code and original view of the moving target image And a ratio of a target hash code corresponding to the moving target image in the frequency, and calculating a ratio of the difference to the target hash code of the corresponding moving target image in the original video;
  • the second determining unit 612 is configured to: when the calculated ratio is greater than or equal to the fourth preset threshold, determine that the moving target image is tampered; and when the calculated ratio is less than the fourth preset threshold, determine that the moving target image is normal.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. Instructions are provided for implementation in the flowchart The steps of a process or a plurality of processes and/or block diagrams of a function specified in a block or blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

一种视频认证方法及装置,所述方法包括:生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同;针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。通过本发明提供的方法能够,提高视频认证的鲁棒性,和准确定位篡改位置。

Description

一种视频认证方法及装置
本申请要求在2014年11月28日提交中国专利局、申请号为201410713432.7、发明名称为“一种视频认证方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信信息安全技术领域,尤其涉及一种视频认证方法及装置。
背景技术
视频一般通过公共的传输通道传输给用户,在传输的过程中,视频可能被恶意篡改,例如视频被删帧,或重排帧,或者帧图像中的内容被修改等。当视频涉及公共安全、政治、军事或法庭证据时,对视频的可信度必须进行评价,因而产生了视频认证技术。视频认证技术是视频信息安全评价与预警***中的关键技术,该技术通过对视频内容的处理,生成可辨别性的标识,通过该标识来实现对视频内容安全性的认证。目前的视频认证主要利用数字水印技术或者感知哈希技术实现。
对于感知哈希技术,通过对原始视频进行处理,得到内容区分性特征,利用该区分性特征形成一个描述视频内容的视频指纹。对与原始视频对应的待认证视频采用同样的方法得到视频指纹,然后采用一定的相似度比较算法实现对待认证视频的认证。例如:对每帧图像进行随机分块,各块之间相互交叠,并对每个分块进行编号,对编号具有预设关系的分块计算其平均亮度差,生成结构哈希向量和时间哈希序列,通过结构哈希向量确定结构哈希距离,而利用时间哈希序列确定时间哈希距离,对结构哈希距离和时间哈希距离进行加权得到原始视频与待认证视频的哈希距离,将哈希距离与设定的阈值进行比较,判断该视频是否被篡改。但该方法只能够判断视频是否被篡改,而不能判断篡改的具***置。
对于数字水印技术,常通过将具有观感的图像作为水印,利用一定的嵌入算法将其嵌入到目标视频的所有帧或者特定帧,在接收端通过提取算法得到嵌入的水印,通过对得到水印的完整性和正确性的评价,实现对视频内容安全性的认证。但是数字水印技术嵌入方法比较复杂,不利于实现视频认证。此外,数字水印会对视频图像的质量有所影响,造成数字水印技术的视频认证的效率低下。
由此,针对以上的不足,需要一种新的视频认证方法,以达到在不影响视频图像质量的情况下,算法简单,又能够准确定位篡改位置的目的。
发明内容
本发明的目的是提供一种视频认证方法及装置,以克服相关技术中感知哈希技术不能定位篡改位置、数字水印技术算法复杂以及影响视频图像质量的问题。
一方面,本发明提供一种视频认证方法,包括:
生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同;
将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配;并,
针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。
其中,在一个实施例中,所述生成待认证视频的待认证哈希码,包括:
针对待认证视频的每一帧图像,获取该帧图像的背景图像和由该帧图像 的非背景图像的像素点形成的至少一个运动目标图像,将每一个运动目标图像视为一个预设区域,并生成每一个运动目标图像在该帧图像中的位置信息;
将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像;
针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码;
针对每一个运动目标图像,生成该运动目标图像的目标哈希码;
根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信息生成所述待认证视频的待认证哈希码。
其中,在一个实施例中,所述根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信息生成所述待认证视频的待认证哈希码,包括:
确定每一个场景图像的场景哈希码的标识;并将每一个运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体;
针对所述待认证视频的每一帧图像,根据预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并在预设位置添加该帧图像对应的场景图像的标识,生成帧图像哈希码;
按照在所述待认证视频的时序关系,级联每一帧图像的帧图像哈希码,生成所述待认证视频的待认证哈希码。
其中,在一个实施例中,所述针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码,包括:
针对每一个场景图像,对该场景图像进行分块;并,将每一个分块视为一个预设区域,对每一块场景图像进行感知哈希特征提取;
根据感知哈希特征提取结果,生成每一个场景图像的特征系数集;
针对特征系数集中的每一种特征系数,在该种特征系数所在的数据范围内,计算出预设数量的谷底,由所述谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其中,第一预设阈值小于等于第二预设阈值;
由每一块场景图像的哈希码组成一个块哈希码,并由一个场景图像的所有块哈希码生成该场景图像的场景哈希码。
其中,在一个实施例中,当预设区域为场景图像的分块时,所述将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配,包括:
将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以分块为单位,针对每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断两分块中的块哈希码中不同位的位数是否超过第三预设阈值;
当不同位的位数超过第三预设阈值时,确定该分块被篡改;当不同位的位数小于等于第三预设阈值时,确定该分块正常。
其中,在一个实施例中,当预设区域为运动目标图像时,所述将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配,包括:
将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以运动目标图像为单位,针对每一个运动目标图像,计算该运动目标图像的哈希码与原始视频中对应运动目标图像的目标哈希码的差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;
当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
另一方面,本发明还提供一种视频认证装置,所述装置包括:
待认证哈希码生成模块,用于生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同;
认证模块,用于将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配;
定位模块,用于针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。
其中,在一个实施例中,所述待认证哈希码生成模块,包括:
处理单元,用于针对待认证视频的每一帧图像,获取该帧图像的背景图像和由该帧图像的非背景图像的像素点形成的至少一个运动目标图像,将每一个运动目标图像视为一个预设区域,并生成每一个运动目标图像在该帧图像中的位置信息;
场景图像生成单元,用于将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像;
场景哈希码生成单元,用于针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码;
目标哈希码生成单元,用于针对每一个运动目标图像,生成该运动目标图像的目标哈希码;
待认证哈希码生成单元,用于根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信息生成所述待认证视频的待认证哈希码。
其中,在一个实施例中,所述待认证哈希码生成单元,包括:
确定子单元,用于确定每一个场景图像的场景哈希码的标识;并将每一个运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体;
帧图像哈希码生成子单元,用于针对所述待认证视频的每一帧图像,根据预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并在预设位置添加该帧图像对应的场景图像的标识,生成帧图像哈希码;
待认证哈希码生成子单元,用于按照在所述待认证视频的时序关系,级联每一帧图像的帧图像哈希码,生成所述待认证视频的待认证哈希码。
其中,在一个实施例中,所述场景哈希码生成单元,包括:
感知哈希特征提取子单元,用于针对每一个场景图像,对该场景图像进行分块;并,将每一个分块视为一个预设区域,对每一块场景图像进行感知哈希特征提取;
特征系数集生成子单元,用于根据感知哈希特征提取结果,生成每一个场景图像的特征系数集;
编码子单元,用于针对特征系数集中的每一种特征系数,在该种特征系数所在的数据范围内,计算出预设数量的谷底,由所述谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其中,第一预设阈值小于等于第二预设阈值;
场景哈希码生成子单元,用于由每一块场景图像的哈希码组成一个块哈希码,并由一个场景图像的所有块哈希码生成该场景图像的场景哈希码。
其中,在一个实施例中,所述认证模块,包括:
分块匹配单元,用于当预设区域为场景图像的分块时,将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以分块为单位,针对每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断两分块中的块哈希码中不同位的位数是否超过第三预设阈值;
第一确定单元,用于当不同位的位数超过第三预设阈值时,确定该分块被篡改;当不同位的位数小于等于第三预设阈值时,确定该分块正常。
其中,在一个实施例中,所述认证模块,包括:
运动目标图像匹配单元,用于当预设区域为运动目标图像时,将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以运动目标图像为单位,针对每一个运动目标图像,计算该运动目标图像的哈希码与原始视频中对应运动目标图像的目标哈希码的差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;
第二确定单元,用于当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
本发明至少具有以下有益效果:在本发明实施例提供的视频认证方法中,通过用相同的流程和方法获取原始视频和待认证视频的哈希码,并通过匹配两者的哈希码进行视频认证,相对于数字水印进行视频认证的方法易于实现,且不会对图像的质量产生影响,通过对视频图像进行场景图像和运动目标图像分离,并分别生成哈希码,有利于分别判断场景图像和运动目标图像是否被篡改,并进一步的,可以根据场景图像的分块确定场景图像被篡改的具***置,根据运动目标图像的位置信息,确定运动目标图像被篡改的位置。综上,本发明实施例提供的视频认证的方法能够即不影响视频的图像质量,也能够准确定位被篡改的位置。此外,现有技术中采用数字水印技术的视频认证方法中,采用半脆弱水印时对一些正常的操作(例如对视频图像进行平滑处理、平移等)的鲁棒性低,使得本发明实施例提供的视频认证方法能够更普遍适用。在本发明实施例中通过在生成哈希码时进行多区间非均匀量化编码,在视频认证时通过具有容忍区间的哈希码匹配方法,相对于现有的数字水印技术能够提高鲁棒性,进而提高视频认证的准确性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。
附图说明
图1为本发明实施例中视频认证方法的示例性流程图之一;
图2为本发明实施例中视频认证方法的示例性流程图之二;
图3为本发明实施例中视频认证方法的示例性流程图之三;
图4为本发明实施例中视频认证方法的多区间非均匀量化方法的示意图;
图5为本发明实施例中视频认证方法的待认证哈希码的组织方式的示意图;
图6为本发明实施例中视频认证装置的示意图之一;
图7为本发明实施例中视频认证装置的示意图之二。
具体实施方式
以下结合说明书附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明,并且在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。
本发明实施例提供一种视频认证方法及装置,尤其适用于监控视频的视频认证。此外,本发明实施例提供的视频认证方法及装置可以适用于不同分辨率、不同帧率的视频,本发明实施例对此并不做限定。
如图1所示,为本发明实施例提供的视频认证方法的流程示意图,在本发明实施例提供的视频认证方法中,通过用相同的算法获取原始视频和待认证视频的哈希码,并通过匹配两者的哈希码进行视频认证,相对于数字水印进行视频认证的方法易于实现,且不会对图像的质量产生影响;通过对视频图像进行场景图像和运动目标图像分离,并分别生成哈希码,有利于分别判断场景图像和运动目标图像是否被篡改,并进一步的,可以根据场景图像的分块确定待认证视频的帧图像中的背景图像部分被篡改的具***置,根据运动目标图像的位置信息,确定待认证视频的帧图像中运动目标图像被篡改的位置。综上,本发明实施例提供的视频认证的方法能够即不影响视频的图像 质量,也能够准确定位被篡改的位置。此外,现有技术中采用数字水印技术的视频认证方法中,采用半脆弱水印时对一些正常的操作(例如对视频图像进行平滑处理、加噪等)的鲁棒性低,使得本发明实施例提供的视频认证方法能够更普遍适用。在本发明实施例中通过在生成哈希码时进行多区间非均匀量化编码,在视频认证时通过具有容忍区间的哈希码匹配方法,相对于现有的数字水印技术能够提高鲁棒性,进而提高视频认证的准确性。下面,对本发明实施例提供的视频认证方法进行详细说明。
实施例一
如图2所示,为本发明实施例中进行视频认证的方法的示例性流程图,该方法包括以下步骤:
步骤201:生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同。
其中,场景图像用于表征视频的背景图像的特征,在一帧图像中,分为背景图像和运动目标图像,背景图像由场景图像表示,运动目标图像由确定为非背景图像的像素点组成,此外,一帧图像中可以包括至少一个运动目标图像。
步骤202:将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配。
步骤203:针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。
下面对上面各步骤所包括的内容进行详细说明。
其中,在一个实施例中,在步骤201中,生成待认证视频的待认证哈希 码,包括以下步骤:
步骤A1:针对待认证视频的每一帧图像,获取该帧图像的背景图像和由该帧图像的非背景图像的像素点形成的至少一个运动目标图像,将每一个运动目标图像视为一个预设区域,并生成每一个运动目标图像在该帧图像中的位置信息。
步骤A2:将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像。
步骤A3:针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码。
步骤A4:针对每一个运动目标图像,生成该运动目标图像的目标哈希码。
步骤A5:根据场景哈希码、目标哈希码、以及运动目标图像的位置信息生成待认证视频的待认证哈希码。
下面对上述步骤A1-步骤A5进行说明:
其中,在一个实施例中,对于步骤A1中任何通过现有技术实现图像中的场景图像与运动目标图像分离的方法均适用于本发明实施例,本发明对此不做限定。其中可以通过建立背景模型提取背景图像和运动目标图像。当通过背景模型提取背景图像和运动目标图像时,涉及到如何建立背景模型,如何生成每帧图像对应的场景图像,如何确定视频中哪些像素点属于运动目标图像,以及如何确定运动目标图像的位置信息,下面对以上各点进行详细说明:
1)背景模型的建立
其中,在一个实施例中,可以通过多样本建模,建立背景模型,并更新场景图像。具体地,采用多个样本建立背景模型的方法如下:
在待认证视频的首帧图像中进行背景模型初始化,对首帧中每个像素点在其预设邻域内随机抽取N个样本,组成初始的背景模型。该背景模型中每个像素点对应一个样本模型,每个样本模型中由预设样本数量的样本组成, 每个样本对应一个样本值。上述样本模型可以由公式(1)表示:
Model(i,j)={sample1,sample2,…,sampleN}      (1)
在公式(1)中,Model(i,j)表示(i,j)点的样本模型,sample表示样本模型中的样本值,N为样本数量。
其中,在一个实施例中每个样本的样本值,可以由该样本的像素值表示。
2)运动目标图像的提取
具体的,在采用多样本建模,建立背景模型之后,提取运动目标图像的方法可以执行为:对于待认证视频的每帧图像内的每个像素点,将该像素点的像素值与当前背景模型中与该像素点的位置相同的样本模型中的每一个样本值进行比较,计算该像素点的像素值与每一个样本值的样本差距,则可以得到N个样本差距,其中,N为每个样本模型中的样本数量。当样本差距小于样本阈值的数量大于预设样本差距数量时,则确定该像素点为背景点(即该像素点属于场景图像),否则,确定该像素点为运动目标图像的像素点。其中,该像素点的像素值与样本值的样本差距可以通过公式(2)确定:
diff=|value(i,j)-samplei|    (2)
在公式(2)中,diff表示样本差距;value(i,j)表示位置为(i,j)点的像素点的像素值;samplei表示位置为(i,j)点的样本模型中的样本值。
3)背景模型的更新
待认证视频中多帧图像的背景之间差异可能比较大,用一个固定不变的背景模型无法准确的描述待认证视频中每帧图像的背景图像的特点,这样的背景模型也就无法准确的提取运动目标图像。因此,为准确描述每帧图像的背景图像的特征,以便于能够准确的提取运动目标图像,需要对背景模型进行更新,并使用更新后的背景模型提取运动目标图像。
在更新背景模型时,具体的更新方法可以为:从第2帧图像开始,对于每帧图像中的每一个像素点,当确定该像素点不属于运动目标图像(即为背景图像的背景点)时,可以以均匀分布[0,θ-1]的随机概率去更新该像素点所 在预设邻域内的随机位置的样本模型中的随机的一个样本值。例如,当预设邻域为8邻域时,一个像素点最多可以在8个预设领域内,则可以从这8个预设邻域内随机选择一个预设邻域,并在选择的预设邻域内随机选择一个位置的样本模型,并从选择的样本模型中的随机选择一个样本值进行更新。较佳的,当预设领域为8领域,样本模型中的样本个数为20时,θ可以根据经验确定,例如θ可以选择为16。需要说明的,任何用现有技术实现对场景图像进行更新方法均适用于本发明实施例,在此不做限定。
至此,运动目标图像的提取过程可以描述为:对于待认证视频的首帧图像,根据初始化的背景模型,提取运动目标图像,在之后的运动目标图像提取过程中,根据背景点更新背景模型。当背景模型更新后,根据更新后的背景模型继续提取的运动目标图像,如此重复,边更新背景模型,边提取运动目标图像,在背景模型更新后,使用更新后的背景模型提取运动目标图像。
4)场景图像的生成
在实际应用中,当视频的场景相对稳定不容易发生变化时,在短时间内视频中的背景图像变化较小,例如监控视频。对于具有这样特征的视频,可以多帧图像共用一个场景图像。
其中,在一个实施例中,对于步骤A2中“将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像”,进行举例说明:例如待认证视频共有100帧图像,分别标号为1-100,其中,第1帧图像至第30帧图像的30个背景图像之间的差距小于等于预设差距,因此第1帧图像至第30帧图像为一个集合,从该集合中选择一帧图像的背景图像生成该集合的每一帧图像的场景图像。类似的,当剩余的第31帧图像至第100帧图像的背景图像的差距小于等于预设差距时,第31帧图像至第100帧图像共用另一个场景图像。由此可见,一百帧图像仅用两个场景图像,描述这100帧图像的背景图像。由此,在本发明实施例中,每一帧图像,可以用场景图像和运动目标图像表示。由于多帧图像共用一个场景图像,在之后的生成视频 的待认证哈希码的过程中,由于用场景图像代替多帧图像的背景图像,可以减少哈希码的长度,提高哈希码匹配的效率,从而提高视频认证的效率。
其中,在一个实施例中,当通过上述的背景模型生成场景图像时,场景图像的生成可以包括场景图像的建立和更新两个过程,具体如下:
a、场景图像的建立
根据待认证视频的首帧图像初始化背景模型后,可以根据该背景模型提取首帧图像的运动目标图像,并由不属于运动目标图像的像素点构成首帧图像的背景图像。将首帧图像的背景图像生成初始场景图像。之后进入下一帧图像的处理流程。
对于第2帧图像,使用背景模型提取该帧图像中的运动目标图像,和背景图像,并用背景图像的像素点继续更新背景模型。将第2帧的背景图像生成的场景图像与第一个场景图像进行比对,当比对的结果不满足更新场景图像的条件时(即背景图像之间的差距小于等于预设差距),继续用初始场景图像作为第2帧图像对应的场景图像;当比对的结果满足更新场景图像的条件时(即背景图像之间的差距大于预设差距时),用第2帧图像的背景图像生成的场景图像作为更新后的场景图像,并用更新后的场景图像作为第2帧图像对应的场景图像。之后进入下一帧图像的处理流程。
对于第3帧图像至待认证视频的最后一帧图像,与第二帧图像的处理流程的方法相同,在此不再赘述。
综上可知,对于当前帧图像,当该帧图像的背景图像与当前的场景图像之间差距小于等于预设差距时,即表示当前帧图像的背景图像与上一帧图像的背景图像之间的差距小于等于预设差距,则当前帧图像划分至上一帧图像所在的集合。反之,当当前帧图像的背景图像与当前的场景图像之间差距大于预设差距时,即表示当前帧图像的背景图像与上一帧图像的背景图像之间的差距大于预设差距,则当前帧图像不能划为上一帧图像所在的集合,由此从当前帧图像开始建立一个新的集合,并用新的场景图像作为新的集合的每一帧图像对应的场景图像。由此,通过循环执行上述的过程,得到每一帧图 像的运动目标图像,和每一帧图像对应的场景图像。
其中,在一个实施例中,也可以将一个集合中任一背景图像选来用作生成该集合的场景图像,本发明对此并不限定。
其中,在一个实施例中,根据背景图像生成场景图像可以执行为:在一帧图像中,根据预设规则填充除背景图像的像素点之外的像素点的像素值。预设规则例如是,用默认的像素值,或者,从背景模型中的对应位置选择一个像素值。需要说明的是,当从背景模型中的对应位置选择一个预设位置的像素值填充,以便于减少因随机选择像素值造成的差距,该差距例如是待认证视频与原始视频的对应场景图像之间的差距。
b、场景图像的更新
可以在背景图像的变化达到一定变化程度后,对场景图像进行更新,实现多帧图像对应一个场景图像。在一个实施例中,可以通过计算当前提取的背景图像生成的场景图像与当前场景图像的帧差,更新当前场景图像。该帧差可以用一个比值表示,该比值的求取过程为:首先计算当前帧图像提取的背景图像生成的场景图像与当前场景图像中对应位置的像素值差距,将像素值差距超过预设帧差差距的像素点的个数除以整帧图像的像素点的总数,用最终获得的比值表示帧差,当该帧差大于预设帧差阈值时,更新当前场景图像。可以通过公式(3)计算帧差:
Figure PCTCN2014095369-appb-000001
其中,在公式(3)中,ratio表示帧差,NUMchange表示像素值差距超过预设帧差差距的像素点的个数;NUMtotal表示整帧图像的像素点的总数。
其中,需要说明的是,也可以通过计算当前提取的背景图像生成的场景图像,与当前的场景图像之间的色差更新场景图像,具体的:
方式一:计算根据提取的背景图像生成的场景图像与当前场景图像中对应位置的像素点的色差,并计算两图像之间的平均色差,当平均色差大于色 差阈值时,更新场景图像,当平均色差小于等于色差阈值时,不更新场景图像。
方式二:计算根据提取的背景图像生成的场景图像与当前场景图像中对应位置的像素点的色差,并计算色差大于色差阈值的像素点的个数,当色差大于色差阈值的像素点的个数与整帧图像的像素点的个数的比值大于预设比值时,更新场景图像,否则,不更新场景图像。
需要说明的是,任何能够表示图像之间的差距的以便于确定是否更新场景图像的方法均适用于本发明实施例,本发明实施例对此不做限定。
由此,本发明实施例中,通过更新场景图像,可以在保证多帧图像共用一个场景图像的前提下,提高场景图像描述背景图像的准确性。
5)运动目标图像的位置信息的确定
其中,当通过多样本建模,建立背景模型并提取运动目标图像之后,每一个运动目标图像都是由至少一个像素点组成的像素点集合,可以根据运动目标图像的像素点在所在的帧图像中的位置,确定运动目标图像的位置信息。具体的,可以以一个能够包括运动目标图像的最小的图形描述该运动目标图像,例如矩形、正方形、圆形等。并以描述该图形在一帧图像中的位置信息,表示运动目标图像在待认证视频中的位置信息。例如,当使用矩形表示运动目标图像时,可以以矩形的左上角顶点的在帧图像中的位置坐标,以及该矩形的长和宽来表示运动目标图像的位置信息,具体的,可以由公式(4)表示一运动目标图像:
recti={xi,yi,widthi,heighti}其中,i=1,2,3......,n    (4)
其中,在公式(4)中,recti表示第i个运动目标图像,(xi,yi)表示矩形左上角顶点的坐标(可以用该矩形表示的运动目标图像所在帧图像中的位置表示);widthi表示矩形的宽;heighti表示矩形的高。
其中,可以以每一帧图像为单位对运动目标图像进行标号,此时,公式(4)中的n值会随着帧图像不同而不同;也可以以整段待认证视频为单位对 运动目标图像标号,此时公式(4)中的n值对于一段确定的待认证视频是一个固定值。现有技术中能够表示运动目标图像的位置的方法均适用于本发明实施例,本发明对此不做限定。
当通过步骤A1将获取场景图像和运动目标图像后,下面可以分别生成表示场景图像的场景哈希码和运动目标图像的目标哈希码,具体的,生成场景图像的场景哈希码的过程,可包括以下内容:
其中,在一个实施例中,场景哈希码的生成过程包括感知哈希特征提取和特征量化编码两部分,由此,步骤A3可执行为以下步骤:
步骤A31:针对每一个场景图像,对该场景图像进行分块;并,将每一个分块视为一个预设区域,对每一块场景图像进行感知哈希特征提取。
其中,在一个实施例中,可以对场景图像均匀分块,也可以非均匀分块。非均匀分块例如是,当场景图像中具有描述该场景图像的特征的图像信息集中在一部分区域时,可以将该部分区域划分为多个小块,而对于特征变化不多例如整体为一个颜色的部分区域划分为一个较大块。
其中,感知哈希是多媒体数据集到感知摘要集的一类单向映射,即将具有相同感知内容的多媒体数字表示唯一的映射为一段数字摘要,并满足感知鲁棒性和安全性。通过感知哈希特征提取后,原始图像的数据集的数据量可以明显减小,由感知摘要集表示原始图像的特征。
其中,在一个实施例中,由于场景图像包含帧图像中的大部分信息,所以场景图像的感知哈希特征提取方法需要尽量提取较为精细的特征,因此,可以采用分块的变换域特征提取方法进行场景图像的感知哈希特征提取,该方法可以是分块DCT变换(Discrete Cosine Transform,离散余弦变换)、分块DWT变换(Discrete Wavelet Transform,离散小波变换)、分块DFT变换(Discrete Fourier Transform,离散傅里叶变换)等。任何可以提取精细的感知哈希特征的方法均适用于本发明实施例,对此不做限定。
步骤A32:根据感知哈希特征提取结果,生成每一个场景图像的特征系数集。
例如,针对每一分块,当通过DCT变换进行感知哈希特征提取时,可以得到该分块的DCT系数矩阵,从该系数矩阵中提取预设第一系数数量的DC系数(直流系数)和预设第二系数数量的AC系数(交流系数)作为该分块的特征系数矩阵中的特征系数,较佳的,为能够有效的表征场景图像的特征,将DCT系数矩阵中的各系数按照从大到小的顺序进行排序,获取预设系数数量(包括第一系数数量或第二系数数量)的系数时,获取排序靠前的预设数量的系数。此外,还可以根据AC系数在DCT系数矩阵中的位置选取预设数量的AC系数,例如可以选取矩阵中所处位置的行号和列号之和小于预设和值的位置的AC系数。对于一个场景图像,该场景图像中所有分块的特征系数矩阵组成该场景图像的特征系数集。其中,每个分块的特征系数矩阵可以由公式(5)表示,每个场景图像的特征系数集可以由公式(6)表示:
Figure PCTCN2014095369-appb-000002
在公式(5)中,
Figure PCTCN2014095369-appb-000003
表示第i个分块的特征系数矩阵;Y表示特征系数矩阵中的特征系数;n1表示第一系数数量;n2表示第二系数数量。
则根据公式(5),可以得到整个场景图像的特征系数集,即公式(6)为:
Figure PCTCN2014095369-appb-000004
在公式(6)中,FEATURE表示特征系数集;
Figure PCTCN2014095369-appb-000005
表示场景图像中记为1的分块的特征系数矩阵,
Figure PCTCN2014095369-appb-000006
表示场景图像中记为2的分块的特征系数矩阵;
Figure PCTCN2014095369-appb-000007
表示场景图像中记为N的分块的特征系数矩阵;N为场景图像分块的数量。
步骤A33:针对特征系数集中的每一种特征系数,在该种特征系数所在的数据范围内,计算出预设数量的谷底,由谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其 中,第一预设阈值小于等于第二预设阈值。
步骤A34:由每一块场景图像的哈希码组成一个块哈希码,并由一个场景图像的所有块哈希码生成该场景图像的场景哈希码。
其中,在步骤A33中,每一个分块的特征系数矩阵中的每一个元素为一种特征系数,例如,继续沿用前面的例子,若每块场景图像通过DCT变换后,获取5个AC系数,1个DC系数,则特征系数的种类为6(即5+1)。当由公式(6)形成特征系数集时,该特征系数集中的每一行特征系数为由N个来自不同的分块的特征系数组成,每一行特征系数为一种特征系数的所有成员。
其中,在一个实施例中,获得场景图像的特征系数集之后,为了减少特征系数集中的冗余信息,同时减少传输成本,需要对特征系数集进行量化。量化方法要保证减少冗余信息的同时保证视频认证结果的鲁棒性和敏感性,具体的,在本发明实施例中通过多区间非均匀量化的方法,对特征系数集进行量化编码,具体的,仍然延续前述的例子,每块场景图像通过DCT变换后,获取5个AC系数,1个DC系数,特征系数集中每一行数据为一种特征系数,对每一种特征系数的量化可以执行为以下操作:
步骤A33-1:将特征系数集的每一行数据根据该行特征系数的数据范围,和预设规则计算该种特征系数的数据直方图。
具体的,将每行特征系数的最大值和最小值所确定的数据范围,分成预设个数的区间,该预设个数的区间用于进行直方图统计,以统计出每个区间包含的特征系数的数量。
步骤A33-2:从每行特征系数的中值所在的区间开始计算谷底,向两边计算直方图的谷底。
较佳的,计算谷底的方法为:从中值所在的区间向两边计算,如果当前区间包含的特征系数的数量小于相邻的两个区间各自包含的特征系数的数量,则将当前区间记为谷底。若预设量化区间个数为m时,则需要找到m-1个谷底。较佳的,为了满足对称性的需求,量化区间的个数m可以为奇数。
步骤A33-3:将谷底作为量化区间的界限值,对每个量化区间内的特征系 数进行量化编码。
其中,较佳的,在编码方式设计上相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其中,第一预设阈值小于等于第二预设阈值。较佳的两特征系数的距离越远,则这两特征系数的哈希码的不同位的位数越多,这样可以根据哈希码不同位的数量来确定对应特征系数的哈希码的差距。
在执行步骤A33之后,实现了对特征系数集进行量化,较佳的,可以将每块场景图像的哈希码按照预设顺序(该预设顺序例如是按照空间顺序自上而下,由左至右的顺序)首尾相连,可以得到场景图像的场景哈希码。需要说明的是,任何有序连接各分块的哈希码的方法均适用于本发明实施例,对此不做限定。
对于步骤A4,生成运动目标图像的目标哈希码的过程,可包括以下内容:
其中,运动目标图像的目标哈希码的生成过程,也包括感知哈希特征提取和特征量化编码两部分,其中,运动目标图像的哈希码生成的方法可以与场景图像的哈希码的生成方法相同,也可以不同。在本发明实施例中,当运动目标图像占整帧图像的比例大于等于预设比例时,即运动目标图像占整帧图像的较大部分时,可以采用与场景图像相同的哈希码生成方法,当运动目标图像占整帧图像的比例小于预设比例时,即运动目标图像占整帧图像的较小部分时,由于运动目标图像本身的特征比较明显,因此只需要提取其粗略的整体特征即可,此时,例如可以采用提取整体的Hu矩来粗略代表运动目标图像的整体特征,具体的,通过求取运动目标图像的7个Hu矩作为运动目标图像的特征系数。然后,通过取整操作实现对7个特征系数的量化,这样便于实现和快速生成运动目标图像的哈希码,从而提高视频认证的速度。需要说明的是,现有技术中实现对运动目标图像的特征系数矩阵的量化方法均适用于本发明实施例,本发明实施例对此不做限定。
经过上述步骤A1-步骤A4分别生成了场景哈希码和目标哈希码之后,便可以生成待认证哈希码,具体的:
其中,在一个实施例中,步骤A5可执行为以下步骤:
步骤A51:确定每一个场景图像的场景哈希码的标识;并将每一个运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体。
其中,在一个实施例中,通过指针方式指向场景哈希码可以视为一种确定场景哈希码的标识的方法。
步骤A52:针对待认证视频的每一帧图像,根据预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并在预设位置添加该帧图像对应的场景图像的标识,生成帧图像哈希码。
其中,对于场景变化不大的视频,例如监控视频,每帧图像中的最大差别在于运动目标图像的位置和形态的变化,而场景图像往往是稳定的。因此,可以使多帧图像共用一个场景图像。具体的,例如,针对每一帧图像,将该帧图像中每一个目标结构体按照其对应的运动目标图像在该帧图像中的空间位置,按照自上而下,由左至右的顺序将每一个目标结构体进行级联,然后在最后一个目标结构体后以指针的方式指向该帧图像对应的场景图像的场景哈希码,由此,最终生成了帧图像哈希码。从而实现,多帧图像共用一个场景图像的场景哈希码。
步骤A53:按照在待认证视频的时序关系,级联每一帧图像的帧图像哈希码,生成待认证视频的待认证哈希码。
其中,具体的,例如,按照在视频中的时间先后顺序计量帧图像哈希码,由此,生成最终的待认证哈希码。
本发明实施例中,通过多帧图像共用一个场景图像,从而通过减少场景图像的哈希码的数量,减少整段待认证视频的哈希码的长度,从而在进行视频认证时,因为哈希码的长度减少,可以降低哈希码的比对长度,从而提高视频认证的速度。
其中,在一个实施例中,当待认证视频通过传输通道(例如网络),传输给用户时,在传输的过程中将不可避免的对待认证视频的视频信息产生不同程度的改变,而这种改变会对视频认证结果产生一定的影响,因此,为了降 低这种改变对于视频认证的影响,在步骤A1之前,还可以对待认证视频进行平滑处理。具体的,由于人眼对视频图像的色彩信息的敏感程度不同,其中,人眼对亮度信息的敏感程度远大于色彩信息中的色调信息以及饱和度信息,所以,在对视频图像进行平滑处理时,可以提取待认证视频的亮度信息,而仅对亮度信息进行平滑处理。例如可以对YUV颜色空间的Y信息进行平滑处理,其中“Y”表示明亮度(Luminance或Luma),也就是灰阶值;而“U”和“V”表示的则是色度(Chrominance或Chroma),作用是描述影像色彩及饱和度,用于指定像素的颜色。若视频图像所使用的颜色空间并非YUV颜色空间时,可以根据现有技术将视频图像的颜色空间转换到YUV颜色空间,进行亮度平滑处理。需要说明的是,可以就实际需求,对其它的颜色信息进行平滑处理以降低上述传输过程中的改变对视频认证的影响,也可以采用其它现有技术的方法达到降低上述传输过程中的改变对视频认证的影响的目的,本发明实施例对此不做限定。
较佳的,可以采用高斯平滑进行平滑处理方法,也可以采用现有技术中提供的其它平滑处理方法,本发明实施例对此不做限定。
至此,对待认证哈希码的生成过程已介绍完,下面对于视频认证的过程进行详述,该认证过程即对待认证视频的待认证哈希码和原始视频的参考哈希码的匹配过程,具体的可以以帧图像为单位匹配每帧图像的帧图像哈希码,在匹配每帧图像的帧图像哈希码时,可以分为场景图像的场景哈希码的匹配,和运动目标图像的目标哈希码的匹配:
1)场景哈希码的匹配
当预设区域为场景图像的分块时,步骤202可以执行为以下操作:将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以分块为单位,针对每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断两分块中的块哈希码中不同位的位数是否超过第三预设阈值;当不同位的位数超过第三预设阈值时,确定该分块被篡改;当不同位的位数小于等于第三预设阈值时,确定该分块正常。
其中,在一个实施例中,第三预设阈值可以与前述的第一预设阈值相同(该第一预设阈值为,对场景图像进行编码时要求相邻系数间哈希码不同位的位数小于第一预设阈值),当然,考虑到误差的存在,第三预设阈值也可以与前述的第一预设阈值也可以不相同,那么该第三预设阈值可以凭经验或用户需求设定。
当第三预设阈值与第一预设阈值相同时,当确定不同位的位数超过第三预设阈值时,根据前述要求相邻特征系数间不同位的位数较少,非相邻特征系数间不同位的位数较多的要求,则可以确定该哈希码对应的特征系数从一个量化区间跳跃整个相邻量化区间而到达不相邻的量化区间。即,该特征系数的变化引起其位置变化大于预设的整个量化区间,即超越了整个容忍范围,因此判定该特征系数发生变化,也即该特征系数对应的分块发生了变化,由此确定,该分块被篡改。否则,当不同位的位数小于等于第三预设阈值时,则说明该特征系数的变化还没有跨过预设的容忍区间,即在容忍范围之内,由此确定该特征系数未发生变化,也即确定该特征系数对应的分块没有发生变化。因此,对于一个分块,其特征系数矩阵中有多个特征系数,当有至少一个特征系数的哈希码与原始视频中对应分块的对应预设系数的哈希码的不同位的位数超过第三预设阈值时,则表示该分块被篡改,否则,表示该分块正常。
以上,通过容忍区间以及前述的多区间非均匀量化的方法,能够提高视频认证的鲁棒性,相对于现有技术通过数字水印的方法进行视频认证时,本发明实施例中采用的视频认证方法能够对一些正常的操作具有鲁棒性,由此可以提高视频认证的准确性。
2)运动目标图像的目标哈希码的匹配
可以根据运动目标图像的哈希码生成方法,分为以下两种情况:
第一种情况:对于运动目标图像所占整帧图像的比例大于等于预设比例的情况,即当运动目标图像的哈希码生成方法与场景图像的哈希码生成方法相同时,则运动目标图像的哈希码匹配方法与场景图像的哈希码匹配方法相 同,在此不再赘述。
第二种情况:对于运动目标图像所占整帧图像的比例小于预设比例的情况,即以运动目标图像为单位提取该运动目标图像粗略的整体特征时,可以以运动目标图像为单位进行哈希码匹配,具体的,针对每一个运动目标图像,计算该运动目标图像的哈希码与原始视频中对应运动目标图像的目标哈希码的差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
具体的,当计算得到的变化率(即比率)大于第四预设阈值时,则确定该特征系数发生变化。对于运动目标图像,其所占整帧图像的比例较小,也即该运动目标图像较小,因此篡改操作对其的影响较大。因此,对于目标哈希码中的多个特征系数的哈希码,当有至少一个特征系数的哈希码发生变化时,则确定该运动目标图像被篡改,当有通过公式(7)计算变化率:
Figure PCTCN2014095369-appb-000008
在公式(7)中,ChangeRatioi表示运动图像的第i个哈希码的变化率;
Figure PCTCN2014095369-appb-000009
表示原始视频中对应运动目标图像的第i个特征系数的哈希码;
Figure PCTCN2014095369-appb-000010
表示待认证视频中运动目标图像的第i个特征系数的哈希码。
其中,第四预设阈值的取值越大,则鲁棒性越差,敏感性越好,可以根据实际需求确定该第四预设阈值的取值。
需要说明的是,上述通过计算变化率认证运动目标图像是否被篡改的方法,也同样适用于场景图像的哈希码匹配的方法,例如可以以分块为单位,计算每个分块的每个特征系数的变化率,当有至少一个系数的变化率超过第四预设阈值时,则确定该分块被篡改。
综上,本发明实施例中,通过用相同的流程和方法获取原始视频和待认证视频的哈希码,并通过哈希码进行视频认证,相对于数字水印进行视频认 证的方法易于实现,且不会对图像的质量产生影响,通过对视频图像进行场景图像和运动目标图像分离,并分别生成哈希码,有利于分别判断场景图像和运动目标图像是否被篡改,并进一步的,可以根据场景图像的分块确定场景图像被篡改的具***置,根据运动目标图像的位置信息,确定运动目标图像被篡改的位置。通过在生成哈希码时进行多区间非均匀量化编码,在视频认证时通过具有容忍区间的哈希码比较方法,相对于现有的数字水印技术能够提高鲁棒性,提高视频认证的准确性。
实施例二
下面以对监控视频的视频进行视频认证为例,由于监控视频的场景图像比较稳定,因此可以建立至少一个场景图像用于表示该监控视频的场景图像。对应监控视频的场景图像,可以通过预先分块,并对每个分块进行DCT变换进行每个分块的感知哈希特征提取,由1个DC系数和5个AC系数组成每个分块的特征系数矩阵,然后通过多区间非均匀量化获得每个分块的块哈希码。由于监控视频中运动目标图像所占整帧图像的比例较小,因此可以每一个目标图像为单位,对每一个目标图像将求得的7个Hu矩作为该运动目标图像的特征系数矩阵,然后通过取证操作获得该运动目标图像的哈希码,最后通过以每个分块为单位,对场景图像的块哈希码通过容忍区间的方法进行匹配,对运动目标图像,采用变换率进行匹配,完成视频认证。具体的,如图3所示,该方法包括以下步骤:
步骤301:在待认证视频的首帧图像中初始化背景模型。
具体的,步骤301中,初始化背景模型的方法可以执行为:对首帧中每个像素在其8邻域内随机抽取20个样本,组成初始的背景模型。
步骤302:根据初始化后的背景模型从待认证视频的首帧图像中提取运动目标图像和背景图像,生成每个运动目标图像的位置信息,并根据背景图像生成当前场景图像,将该当前场景图像作为首帧图像对应的场景图像。
对于首帧图像中的每一个像素点,通过公式(2)确定该像素点是否属于背景点(即是否属于背景模型),具体的,将该像素点与背景模型中对应位置的 像素点的样本模型中的每一个样本值进行比较,获得样本差距,当获得的样本差距小于样本阈值的像素点的数量大于预设样本差距数量时,则确定该像素点为背景点,否则,该像素点为运动目标图像的像素点。
运动目标图像的位置信息,根据前述公式(4),由可包含运动目标图像的最小矩形的左上角顶点和该矩形的宽和高来表示。
步骤303:从待认证视频的第2帧图像开始对背景模型进行更新,并针对每一帧图像使用更新后的背景模型提取运动目标图像和背景图像。
更新方法参见步骤A1中的更新背景模型的方法,在此不再赘述。
步骤304:从待认证视频的第2帧图像开始,针对每一帧图像,根据提取的背景图像判断当前场景图像是否更新,当确定当前场景图像更新时,将更新后的场景图像作为与该帧图像对应的场景图像,当确定当前场景图像不更新时,将当前场景图像作为与该帧图像对应的场景图像。
具体的,监控视频的场景图像比较稳定,短时间内变化不明显,可以通过实施例一中描述的通过计算帧差或色差的方法,更新当前场景图像,在此不再赘述。
步骤305:针对每一个场景图像,对该场景图像进行分块,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码。
具体的,步骤305可执行为以下步骤:
步骤C1:针对每一个场景图像,对该场景图像进行分块,每一块为一个分块;并,对各分块进行DCT变换,将DCT变换后的DCT系数矩阵中,选取1个DC系数和5个较大的AC系数作为该分块的特征系数。例如可以选取所在的行号和列号之和小于5的位置的AC系数,例如行号为第一行,列号为第一列时,行号和列号之和为2,小于预设和值5,因此位于第一行第一列位置的AC系数作为选用系数。
具体的,将DCT系数矩阵中的AC系数按照从大到小的顺序排列,选取排序靠前的5个AC系数。
步骤C2:针对每一个场景图像,由该场景图像的所有分块的特征系数,构成该场景图像的特征系数集,其中,在该特征系数集中,每一行的特征系数均来自不同的分块,每一行特征系数为一种特征系数的所有成员。
步骤C3:针对特征系数集中的每一行特征系数,在该行特征系数所在的数据范围内,将该行特征系数划分为500个区间,并从该行特征系数的中值所在的区间开始向两边计算出4个谷底,由谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于1,非相邻特征系数的不同位的位数大于1。
通过步骤C3实现了对场景图像中每个分块的多区间非均匀量化。其中,由于4个谷底将每行特征系数划分为5个量化区间,因此可以用3位二进制码表示每一个特征系数。由此,每一个分块由6个3位二进制码组成该分块的块哈希码。
为便于理解,如图4,为本发明实施例中,在预设谷底的数量为4时,进行多区间非均匀量化方法的示意图,在图4中,特征系数集中一行特征系数的范围有两端的两个黑色实心圆点表示;每个Interval代表一个谷底,4个固定将该行特征系数划分为5个量化区间,每个量化区间的量化结果为每个量化区间上面对应的二进制码。
步骤306:以运动目标图像为单位,针对每一个运动目标图像,提取该运动目标图像的7个Hu矩作为该运动目标图像的特征系数,并通过取整操作,对该运动目标图像的特征系数进行量化,生成该运动目标图像的目标哈希码。
步骤307:针对每一个运动目标图像,将该运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体。
其中,需要说明的是步骤305-步骤307的执行顺序不受限。
步骤308:针对待认证视频的每一帧图像,根据预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并由最后一个目标结构体以指针的方式指向该帧图像对应的场景哈希码,由此生成帧图像哈希码。
步骤309:按照每帧图像在待认证视频中的时序关系,依次级联每帧图像的帧图像哈希码,生成该待认证视频的待认证哈希码。
如图5所示,为生成待认证哈希码的组织方式的示意图,在图5中,每一个SceneHash代表一个场景图像的场景哈希码;每一个frame代表一帧图像;每一帧图像中,每一个Rect表示该帧图像中的一个运动目标图像,每一个MotObjHash表示对应运动目标图像的目标哈希码。其中以frame1为例,最后一个运动目标图像Rect3以指针的方式指向场景图像SceneHash1的场景哈希码。各帧图像以在待认证视频中的时序关系,级联起来生成待认证视频的待认证哈希码。
步骤310:获取原始视频的参考哈希码,其中该参考哈希码的生成算法与待认证哈希码的生成算法相同。
步骤311:对待认证哈希码和参考哈希码进行匹配,对待认证视频进行认证。
其一,对于场景图像,以分块为单位,针对场景图像的每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断不同位的位数是否超过1个;当不同位的位数超过1个时,确定该分块被篡改;当不同位的位数小于等于1时,确定该分块正常。
在本发明实施例中,每个分块有6个特征系数,每个特征系数由3位二进制码表示,由此,可以通过公式(8)确定每个特征系数的哈希码的不同位的位数:
Figure PCTCN2014095369-appb-000011
在公式(8)中,DiffBit表示不同位的位数,Dbiti表示待认证视频中特征系数的二进制码的第i位;Obiti表示原始视频中对应分块的对应特征系数的第i位。
需要说明的是,当用k位二进制码表示每个特征系数时,公式(8)中的3可以替换为k,用以计算每个特征系数的哈希码不同位的位数。
其二,对于运动目标图像的哈希码的匹配,以运动目标图像为单位,针对待认证视频的每一个运动目标图像,计算该运动目标图像的哈希码与原始视频中对应运动目标图像的目标哈希码差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
具体的计算比率的方法请参见前述公式(7)的方法,在此不再赘述。
本发明实施例,以监控视频为例,通过场景图像和运动目标图像分离,对应具有丰富特征的场景图像,进行精细的感知哈希特征提取,和多区间非均匀量化编码,生成场景图像的场景哈希码,并采用具有容忍区间的哈希码比较方法进行认证,能够提高视频认证的鲁棒性,从而提高视频认证的准确性,使得本发明实施例提供的视频认证方法能够更普遍适用。对于较小的运动目标图像,进行粗略的感知哈希特征提取,在不影响认证结果的情况下能够提高提取感知哈希特征和量化编码的效率。通过由多帧图像共用一个场景图像(即场景图像),从而在生成待认证视频的待认证哈希码时,减小该哈希码的长度,在进行视频认证时,降低哈希码匹配的数量,从而提高视频认证的效率。此外,通过分块,和运动目标图像的位置信息,当确定待认证被篡改时,可以准确的确定篡改的位置。
基于相同的构思,本发明实施例中还提供一种视频认证装置,如图6所示,为该装置的示意图,该装置包括:
待认证哈希码生成模块601,用于生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同;
认证模块602,用于将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域 的哈希码进行匹配;
定位模块603,用于针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。
其中,在一个实施例中,如图7所示,所述待认证哈希码生成模块601,包括:
处理单元604,用于针对待认证视频的每一帧图像,获取该帧图像的背景图像和由该帧图像的非背景图像的像素点形成的至少一个运动目标图像,将每一个运动目标图像视为一个预设区域,并生成每一个运动目标图像在该帧图像中的位置信息;
场景图像生成单元605,用于将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像;
场景哈希码生成单元606,用于针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码;
目标哈希码生成单元607,用于针对每一个运动目标图像,生成该运动目标图像的目标哈希码;
待认证哈希码生成单元608,用于根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信息生成所述待认证视频的待认证哈希码。
其中,在一个实施例中,所述待认证哈希码生成单元,包括:
确定子单元,用于确定每一个场景图像的场景哈希码的标识;并将每一个运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体;
帧图像哈希码生成子单元,用于针对所述待认证视频的每一帧图像,根据预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并在预设位置添加该帧图像对应的场景图像的标识,生成帧图像哈希码;
待认证哈希码生成子单元,用于按照在所述待认证视频的时序关系,级联每一帧图像的帧图像哈希码,生成所述待认证视频的待认证哈希码。
其中,在一个实施例中,所述场景哈希码生成单元,包括:
感知哈希特征提取子单元,用于针对每一个场景图像,对该场景图像进行分块;并,将每一个分块视为一个预设区域,对每一块场景图像进行感知哈希特征提取;
特征系数集生成子单元,用于根据感知哈希特征提取结果,生成每一个场景图像的特征系数集;
编码子单元,用于针对特征系数集中的每一种特征系数,在该种特征系数所在的数据范围内,计算出预设数量的谷底,由所述谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其中,第一预设阈值小于等于第二预设阈值;
场景哈希码生成子单元,用于由每一块场景图像的哈希码组成一个块哈希码,并由一个场景图像的所有块哈希码生成该场景图像的场景哈希码。
其中,在一个实施例中,如图7所示,所述认证模块602,包括:
分块匹配单元609,用于当预设区域为场景图像的分块时,将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以分块为单位,针对每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断两分块中的块哈希码中不同位的位数是否超过第三预设阈值;
第一确定单元610,用于当不同位的位数超过第三预设阈值时,确定该分块被篡改;当不同位的位数小于等于第三预设阈值时,确定该分块正常。
其中,在一个实施例中,如图7所示,所述认证模块602,包括:
运动目标图像匹配单元611,用于当预设区域为运动目标图像时,将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以运动目标图像为单位,针对每一个运动目标图像,计算该运动目标图像的哈希码与原始视 频中对应运动目标图像的目标哈希码的差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;
第二确定单元612,用于当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图 一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (12)

  1. 一种视频认证方法,其特征在于,所述方法包括:
    生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同;
    将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配;并,
    针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。
  2. 根据权利要求1所述的方法,其特征在于,所述生成待认证视频的待认证哈希码,包括:
    针对待认证视频的每一帧图像,获取该帧图像的背景图像和由该帧图像的非背景图像的像素点形成的至少一个运动目标图像,将每一个运动目标图像视为一个预设区域,并生成每一个运动目标图像在该帧图像中的位置信息;
    将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像;
    针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码;
    针对每一个运动目标图像,生成该运动目标图像的目标哈希码;
    根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信 息生成所述待认证视频的待认证哈希码。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信息生成所述待认证视频的待认证哈希码,包括:
    确定每一个场景图像的场景哈希码的标识;并将每一个运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体;
    针对所述待认证视频的每一帧图像,根据预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并在预设位置添加该帧图像对应的场景图像的标识,生成帧图像哈希码;
    按照在所述待认证视频的时序关系,级联每一帧图像的帧图像哈希码,生成所述待认证视频的待认证哈希码。
  4. 根据权利要求2所述的方法,其特征在于,所述针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码,包括:
    针对每一个场景图像,对该场景图像进行分块;并,将每一个分块视为一个预设区域,对每一块场景图像进行感知哈希特征提取;
    根据感知哈希特征提取结果,生成每一个场景图像的特征系数集;
    针对特征系数集中的每一种特征系数,在该种特征系数所在的数据范围内,计算出预设数量的谷底,由所述谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其中,第一预设阈值小于等于第二预设阈值;
    由每一块场景图像的哈希码组成一个块哈希码,并由一个场景图像的所有块哈希码生成该场景图像的场景哈希码。
  5. 根据权利要求1-4中任一所述的方法,其特征在于,当预设区域为场景图像的分块时,所述将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配,包括:
    将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以分块为单位,针对每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断两分块中的块哈希码中不同位的位数是否超过第三预设阈值;
    当不同位的位数超过第三预设阈值时,确定该分块被篡改;当不同位的位数小于等于第三预设阈值时,确定该分块正常。
  6. 根据权利要求1-4中任一所述的方法,其特征在于,当预设区域为运动目标图像时,所述将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配,包括:
    将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以运动目标图像为单位,针对每一个运动目标图像,计算该运动目标图像的哈希码与原始视频中对应运动目标图像的目标哈希码的差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;
    当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
  7. 一种视频认证装置,其特征在于,所述装置包括:
    待认证哈希码生成模块,用于生成待认证视频的待认证哈希码,并获取该待认证视频的原始视频的参考哈希码,所述待认证哈希码中包括:针对待认证视频的每一帧图像,对该帧图像的图像区域进行划分得到的每个预设区域的哈希码,以及表示每个预设区域在该帧图像中所处位置的位置信息;所述参考哈希码的生成算法与所述待认证哈希码的生成算法相同;
    认证模块,用于将待认证视频和原始视频中帧号相同的图像视为一组待认 证图像,以预设区域为单位,对该组待认证图像中的每个对应预设区域的哈希码进行匹配;
    定位模块,用于针对每个预设区域的哈希码,当该预设区域的哈希码与原始视频中对应预设区域的哈希码不匹配时,确定该预设区域被篡改,并根据该预设区域的位置信息确定该待认证视频被篡改的位置。
  8. 根据权利要求7所述的装置,其特征在于,所述待认证哈希码生成模块,包括:
    处理单元,用于针对待认证视频的每一帧图像,获取该帧图像的背景图像和由该帧图像的非背景图像的像素点形成的至少一个运动目标图像,将每一个运动目标图像视为一个预设区域,并生成每一个运动目标图像在该帧图像中的位置信息;
    场景图像生成单元,用于将背景图像之间的差距小于等于预设差距的帧图像视为一个集合,并针对每一个集合,从该集合中选取一帧图像的背景图像生成该集合中的每一帧图像对应的场景图像;
    场景哈希码生成单元,用于针对每一个场景图像,对该场景图像进行分块,将每一个分块视为一个预设区域,并生成每块场景图像的块哈希码,由该场景图像的所有块哈希码生成该场景图像的场景哈希码;
    目标哈希码生成单元,用于针对每一个运动目标图像,生成该运动目标图像的目标哈希码;
    待认证哈希码生成单元,用于根据所述场景哈希码、所述目标哈希码、以及所述运动目标图像的位置信息生成所述待认证视频的待认证哈希码。
  9. 根据权利要求8所述的装置,其特征在于,所述待认证哈希码生成单元,包括:
    确定子单元,用于确定每一个场景图像的场景哈希码的标识;并将每一个运动目标图像的目标哈希码和该目标图像的位置信息视为一个目标结构体;
    帧图像哈希码生成子单元,用于针对所述待认证视频的每一帧图像,根据 预设空间位置关系,将该帧图像的运动目标图像的目标结构体进行级联,并在预设位置添加该帧图像对应的场景图像的标识,生成帧图像哈希码;
    待认证哈希码生成子单元,用于按照在所述待认证视频的时序关系,级联每一帧图像的帧图像哈希码,生成所述待认证视频的待认证哈希码。
  10. 根据权利要求8所述的装置,其特征在于,所述场景哈希码生成单元,包括:
    感知哈希特征提取子单元,用于针对每一个场景图像,对该场景图像进行分块;并,将每一个分块视为一个预设区域,对每一块场景图像进行感知哈希特征提取;
    特征系数集生成子单元,用于根据感知哈希特征提取结果,生成每一个场景图像的特征系数集;
    编码子单元,用于针对特征系数集中的每一种特征系数,在该种特征系数所在的数据范围内,计算出预设数量的谷底,由所述谷底将该种特征系数的数据范围划分为不同的量化区间;并将每一个量化区间内的特征系数进行量化,并编码为哈希码;其中,在编码时,相邻特征系数的哈希码的不同位的位数小于等于第一预设阈值,非相邻特征系数的不同位的位数大于第二预设阈值,其中,第一预设阈值小于等于第二预设阈值;
    场景哈希码生成子单元,用于由每一块场景图像的哈希码组成一个块哈希码,并由一个场景图像的所有块哈希码生成该场景图像的场景哈希码。
  11. 根据权利要求7-10中任一所述的装置,其特征在于,所述认证模块,包括:
    分块匹配单元,用于当预设区域为场景图像的分块时,将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以分块为单位,针对每一个分块,将该分块的哈希码与原始视频中对应分块的块哈希码进行比对,判断两分块中的块哈希码中不同位的位数是否超过第三预设阈值;
    第一确定单元,用于当不同位的位数超过第三预设阈值时,确定该分块被 篡改;当不同位的位数小于等于第三预设阈值时,确定该分块正常。
  12. 根据权利要求7-10中任一所述的装置,其特征在于,所述认证模块,包括:
    运动目标图像匹配单元,用于当预设区域为运动目标图像时,将待认证视频和原始视频中帧号相同的图像视为一组待认证图像,以运动目标图像为单位,针对每一个运动目标图像,计算该运动目标图像的哈希码与原始视频中对应运动目标图像的目标哈希码的差值,并计算该差值所占原始视频中对应运动目标图像的目标哈希码的比率;
    第二确定单元,用于当计算得到的比率大于等于第四预设阈值时,确定该运动目标图像被篡改;当计算得到的比率小于第四预设阈值时,确定该运动目标图像正常。
PCT/CN2014/095369 2014-11-28 2014-12-29 一种视频认证方法及装置 WO2016082277A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410713432.7A CN104581431B (zh) 2014-11-28 2014-11-28 一种视频认证方法及装置
CN201410713432.7 2014-11-28

Publications (1)

Publication Number Publication Date
WO2016082277A1 true WO2016082277A1 (zh) 2016-06-02

Family

ID=53096468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/095369 WO2016082277A1 (zh) 2014-11-28 2014-12-29 一种视频认证方法及装置

Country Status (2)

Country Link
CN (1) CN104581431B (zh)
WO (1) WO2016082277A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110176043A (zh) * 2019-05-30 2019-08-27 兰州交通大学 一种运用dct感知哈希矢量地理空间数据内容认证方法
CN110377454A (zh) * 2019-06-17 2019-10-25 中国平安人寿保险股份有限公司 数据校验方法、装置、计算机设备和存储介质
CN111510297A (zh) * 2020-03-24 2020-08-07 兰州交通大学 全局与局部特征结合的高分辨率遥感影像完整性认证方法
CN112383672A (zh) * 2020-10-21 2021-02-19 南京邮电大学 一种兼顾隐私保护和数据质量的图像采集方法、装置及存储介质
CN113242409A (zh) * 2021-04-26 2021-08-10 深圳市安星数字***有限公司 基于无人机的夜视预警方法、装置、无人机及存储介质
CN113672761A (zh) * 2021-07-16 2021-11-19 北京奇艺世纪科技有限公司 视频处理方法及装置
CN114153411A (zh) * 2021-12-02 2022-03-08 上海交通大学 面向远程终端管控的图像优化传输***
CN115830508A (zh) * 2022-12-13 2023-03-21 陕西通信规划设计研究院有限公司 一种5g消息内容检测方法
CN118075572A (zh) * 2024-03-14 2024-05-24 北京和人广智科技有限公司 一种视频投屏拍摄溯源方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106454384B (zh) * 2015-08-04 2019-06-25 中国科学院深圳先进技术研究院 视频帧***和帧删除检测方法
CN106454385B (zh) * 2015-08-04 2019-06-25 中国科学院深圳先进技术研究院 视频帧篡改检测方法
CN107135421B (zh) * 2017-06-13 2020-08-07 北京市博汇科技股份有限公司 视频特征检测方法及装置
CN109587518B (zh) * 2017-09-28 2022-06-07 三星电子株式会社 图像传输装置、操作图像传输装置的方法以及片上***
CN109063428A (zh) * 2018-06-27 2018-12-21 武汉大学深圳研究院 一种数字动画的篡改检测方法及其***
CN109361952A (zh) * 2018-12-14 2019-02-19 司马大大(北京)智能***有限公司 视频管理方法、装置、***及电子设备
US11106827B2 (en) 2019-03-26 2021-08-31 Rovi Guides, Inc. System and method for identifying altered content
CA3104949A1 (en) * 2019-03-26 2020-10-01 Rovi Guides, Inc. System and method for identifying altered content
US11134318B2 (en) 2019-03-26 2021-09-28 Rovi Guides, Inc. System and method for identifying altered content
CN110035327B (zh) * 2019-04-17 2020-07-17 深圳市摩天之星企业管理有限公司 一种安全播放方法
WO2021183645A1 (en) * 2020-03-11 2021-09-16 Bytedance Inc. Indication of digital media integrity
CN112101155B (zh) * 2020-09-02 2024-04-26 北京博睿维讯科技有限公司 一种显示内容核实方法、装置、***及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006295665A (ja) * 2005-04-13 2006-10-26 Mitsubishi Electric Corp 画像記録装置及び画像記録方法
CN1858799A (zh) * 2005-05-08 2006-11-08 中国科学院计算技术研究所 一种数字图像哈希签名方法
CN101964041A (zh) * 2010-09-25 2011-02-02 合肥工业大学 一种基于感知哈希的实用安全图像取证***及其取证方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576896A (zh) * 2008-05-09 2009-11-11 鸿富锦精密工业(深圳)有限公司 相似图片检索***及方法
CN102393900B (zh) * 2011-07-02 2013-05-29 山东大学 基于鲁棒哈希的视频拷贝检测方法
US8799236B1 (en) * 2012-06-15 2014-08-05 Amazon Technologies, Inc. Detecting duplicated content among digital items
CN103744973A (zh) * 2014-01-11 2014-04-23 西安电子科技大学 基于多特征哈希的视频拷贝检测方法
CN103747254A (zh) * 2014-01-27 2014-04-23 深圳大学 一种基于时域感知哈希的视频篡改检测方法和装置
CN103747271B (zh) * 2014-01-27 2017-02-01 深圳大学 一种基于混合感知哈希的视频篡改检测方法和装置
CN104077590A (zh) * 2014-06-30 2014-10-01 安科智慧城市技术(中国)有限公司 一种视频指纹提取方法及***

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006295665A (ja) * 2005-04-13 2006-10-26 Mitsubishi Electric Corp 画像記録装置及び画像記録方法
CN1858799A (zh) * 2005-05-08 2006-11-08 中国科学院计算技术研究所 一种数字图像哈希签名方法
CN101964041A (zh) * 2010-09-25 2011-02-02 合肥工业大学 一种基于感知哈希的实用安全图像取证***及其取证方法

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110176043A (zh) * 2019-05-30 2019-08-27 兰州交通大学 一种运用dct感知哈希矢量地理空间数据内容认证方法
CN110176043B (zh) * 2019-05-30 2022-09-30 兰州交通大学 一种运用dct感知哈希矢量地理空间数据内容认证方法
CN110377454A (zh) * 2019-06-17 2019-10-25 中国平安人寿保险股份有限公司 数据校验方法、装置、计算机设备和存储介质
CN111510297A (zh) * 2020-03-24 2020-08-07 兰州交通大学 全局与局部特征结合的高分辨率遥感影像完整性认证方法
CN111510297B (zh) * 2020-03-24 2023-09-08 兰州交通大学 全局与局部特征结合的高分辨率遥感影像完整性认证方法
CN112383672A (zh) * 2020-10-21 2021-02-19 南京邮电大学 一种兼顾隐私保护和数据质量的图像采集方法、装置及存储介质
CN113242409A (zh) * 2021-04-26 2021-08-10 深圳市安星数字***有限公司 基于无人机的夜视预警方法、装置、无人机及存储介质
CN113242409B (zh) * 2021-04-26 2023-09-12 国网安徽省电力有限公司天长市供电公司 基于无人机的夜视预警方法、装置、无人机及存储介质
CN113672761B (zh) * 2021-07-16 2023-07-25 北京奇艺世纪科技有限公司 视频处理方法及装置
CN113672761A (zh) * 2021-07-16 2021-11-19 北京奇艺世纪科技有限公司 视频处理方法及装置
CN114153411A (zh) * 2021-12-02 2022-03-08 上海交通大学 面向远程终端管控的图像优化传输***
CN114153411B (zh) * 2021-12-02 2024-01-12 上海交通大学 面向远程终端管控的图像优化传输***
CN115830508A (zh) * 2022-12-13 2023-03-21 陕西通信规划设计研究院有限公司 一种5g消息内容检测方法
CN115830508B (zh) * 2022-12-13 2024-03-08 陕西通信规划设计研究院有限公司 一种5g消息内容检测方法
CN118075572A (zh) * 2024-03-14 2024-05-24 北京和人广智科技有限公司 一种视频投屏拍摄溯源方法

Also Published As

Publication number Publication date
CN104581431A (zh) 2015-04-29
CN104581431B (zh) 2018-01-30

Similar Documents

Publication Publication Date Title
WO2016082277A1 (zh) 一种视频认证方法及装置
Tang et al. Robust image hashing based on color vector angle and Canny operator
WO2021068330A1 (zh) 智能图像分割及分类方法、装置及计算机可读存储介质
Sadek et al. Robust video steganography algorithm using adaptive skin-tone detection
CN103238159B (zh) 用于图像认证的***和方法
JP4732660B2 (ja) ビジュアルアテンションシステム
Tang et al. Perceptual image hashing with weighted DWT features for reduced-reference image quality assessment
Kumar et al. Near lossless image compression using parallel fractal texture identification
CN104661037B (zh) 压缩图像量化表篡改的检测方法和***
CN111050022B (zh) 一种高安全性图像传输***及方法
WO2015167901A1 (en) Video fingerprinting
CN112507842A (zh) 一种基于关键帧提取的视频文字识别方法和装置
Yang et al. A clustering-based framework for improving the performance of JPEG quantization step estimation
CN106503112B (zh) 视频检索方法和装置
US9305603B2 (en) Method and apparatus for indexing a video stream
CN114626967A (zh) 一种数字水印嵌入与提取方法、装置、设备及存储介质
Xie et al. Bag-of-words feature representation for blind image quality assessment with local quantized pattern
Yao et al. An improved first quantization matrix estimation for nonaligned double compressed JPEG images
CN105227964B (zh) 视频认证方法及***
Li et al. Perceptual hashing for color images
Sharma et al. A review of passive forensic techniques for detection of copy-move attacks on digital videos
CN110930287B (zh) 一种图像隐写检测方法、装置及计算机设备、存储介质
CN106055632B (zh) 基于场景帧指纹的视频认证方法
Baroffio et al. Hybrid coding of visual content and local image features
Tian et al. Just noticeable difference modeling for face recognition system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14907027

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14907027

Country of ref document: EP

Kind code of ref document: A1