CN114710713A - Automatic video abstract generation method based on deep learning - Google Patents
Automatic video abstract generation method based on deep learning Download PDFInfo
- Publication number
- CN114710713A CN114710713A CN202210337196.8A CN202210337196A CN114710713A CN 114710713 A CN114710713 A CN 114710713A CN 202210337196 A CN202210337196 A CN 202210337196A CN 114710713 A CN114710713 A CN 114710713A
- Authority
- CN
- China
- Prior art keywords
- video
- sub
- environment
- videos
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013135 deep learning Methods 0.000 title claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 46
- 230000007613 environmental effect Effects 0.000 claims abstract description 44
- 238000007906 compression Methods 0.000 claims description 38
- 230000006835 compression Effects 0.000 claims description 38
- 230000009471 action Effects 0.000 claims description 18
- 238000013144 data compression Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 230000000007 visual effect Effects 0.000 claims description 5
- 230000003321 amplification Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 description 9
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8549—Creating video summaries, e.g. movie trailer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/665—Control of cameras or camera modules involving internal camera communication with the image sensor, e.g. synchronising or multiplexing SSIS control signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/67—Focus control based on electronic image sensor signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Image Processing (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The invention provides an automatic video abstract generation method based on deep learning, which is used for synchronously shooting different azimuth areas of the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the environmental scene sub-video to obtain semantic labels of different objects appearing in the environmental scene sub-video, and then forming a video content abstract in a preset picture of the environmental scene sub-video; and finally, according to the shooting direction of each environment scene sub-video, performing picture splicing on all environment scene sub-videos to obtain corresponding environment panoramic scene videos, so that the environment scene sub-videos shot by different cameras can be synchronously identified and analyzed, objects in the environment scene sub-videos are calibrated, and matched video content summaries are generated, so that the videos are comprehensively and accurately screened and identified, and the automation and the intelligent degree of video identification processing are improved.
Description
Technical Field
The invention relates to the technical field of video data processing, in particular to an automatic video abstract generation method based on deep learning.
Background
At present, a camera monitoring device is usually arranged in a public place to collect images of the place in real time, and the collected monitoring images are identified and analyzed, so that abnormal persons or conditions existing in the monitoring images are screened. The prior art basically carries out manual screening and identification on the monitored images, the mode needs to rely on a large number of personnel to carry out frame-by-frame screening and identification on the monitored images, and cannot gather and integrate the identification results of the monitored images, so that the monitored images cannot be comprehensively and accurately screened and identified, the monitored images cannot be deeply processed, and the automation and the intelligent degree of the monitoring image identification processing are reduced.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an automatic video abstract generation method based on deep learning, which is used for synchronously shooting different azimuth areas of the same environmental occasion to obtain a plurality of sub-videos of the environmental occasion; identifying the environmental scene sub-video to obtain semantic labels of different objects appearing in the environmental scene sub-video, and then forming a video content abstract in a preset picture of the environmental scene sub-video; and finally, according to the shooting direction of each environment scene sub-video, performing picture splicing on all environment scene sub-videos to obtain corresponding environment panoramic scene videos, so that the environment scene sub-videos shot by different cameras can be synchronously identified and analyzed, objects in the environment scene sub-videos are calibrated, and matched video content summaries are generated, so that the videos are comprehensively and accurately screened and identified, and the automation and the intelligent degree of video identification processing are improved.
The invention provides an automatic video abstract generation method based on deep learning, which comprises the following steps:
step S1, synchronously shooting different azimuth areas of the same environmental occasion through a plurality of cameras respectively, and acquiring a plurality of environmental occasion sub-videos; according to the shooting direction of the environment scene sub-videos, all the environment scene sub-videos are stored in a block chain in a grouping mode;
step S2, extracting corresponding environment situation sub-videos from the block chain according to a video acquisition request from a video processing terminal, and transmitting the sub-videos to the video processing terminal; then, the sub-videos of the environment situation are identified, so that semantic labels of different objects appearing in the sub-videos of the environment situation are obtained;
step S3, forming a video content abstract in a preset picture of the environment scene sub-video according to the semantic label; then, performing data compression processing on the sub-video in the environment situation;
and step S4, splicing all the environment scene sub-videos according to the shooting direction of each environment scene sub-video, thereby obtaining the corresponding environment panoramic scene video.
Further, in step S1, the cameras respectively shoot different azimuth areas of the same environment, so that the acquiring of the sub-videos of the plurality of environment specifically includes:
the method comprises the following steps of respectively aligning the shooting directions of a plurality of cameras to different azimuth areas along the circumferential direction of the same environment occasion, and simultaneously adjusting the shooting visual angle of each camera, so that the whole shooting visual angles of all the cameras can completely cover the whole circumferential azimuth area of the environment occasion;
and then all the cameras are instructed to synchronously shoot at the same focal length, so that a plurality of sub-videos of the environment and the occasion are acquired.
Further, in step S1, storing all the environment scenario sub-video packets in a block chain according to the shooting orientations of the environment scenario sub-videos specifically includes:
acquiring shooting azimuth information of each camera, and adding the shooting azimuth information serving as video index information into a corresponding environment and situation sub-video; all the environmental context sub-video packets are then stored in the blockchain.
Further, in step S2, according to a video acquisition request from a video processing terminal, extracting a corresponding environment situation sub-video from the block chain, and transmitting the extracted environment situation sub-video to the video processing terminal specifically includes:
extracting corresponding video shooting time range conditions from a video acquisition request from a video processing terminal, and extracting an environment situation sub-video matched with the video shooting time range from the block chain; and synchronously transmitting all the sub-videos of the environment occasions obtained by extraction to the video processing terminal.
Further, in step S2, the identifying the environmental scene sub-video, so as to obtain semantic tags of different objects appearing in the environmental scene sub-video specifically includes:
decomposing the environment scene sub-video into a plurality of environment scene picture frames according to the sequence of the time axis of the video stream of the environment scene sub-video;
identifying each environment occasion picture frame so as to obtain identity attribute information and action attribute information of different objects initially selected by the environment occasion picture frame;
and generating an identity attribute semantic label and an action attribute semantic label related to the object according to the identity attribute information and the action attribute information.
Further, in step S3, forming a video content summary in the preset picture of the environmental situation sub-video according to the semantic tag specifically includes:
generating a text abstract related to the identity state and the action state of the object according to the identity attribute semantic label and the action attribute semantic label;
selecting a preset summary adding picture area in the picture frame of the environment occasion where the object appears, wherein the summary adding picture area is not overlapped with the picture area where the object appears in the picture frame of the environment occasion;
and after the text abstract is added to the abstract adding picture area, performing self-font amplification display on the text abstract.
Further, in step S3, the data compression processing on the environment scene sub-video specifically includes:
and according to the sequence of the time axis of the video stream of the environment scene sub-video, recombining all the image frames of the environment scene in sequence to obtain the environment scene sub-video, and then carrying out fidelity compression processing on the environment scene sub-video.
Further, in step S3, the performing fidelity compression processing on the environment scene sub-video specifically includes:
step S301, screening out fidelity compressed pixel values of the video according to the environment situation sub-video by using the following formula (1),
in the above formula (1), l represents a fidelity compressed pixel value of the ambient scene sub-video; l isa(i, j) representing the pixel value of a jth pixel point of an ith row of an ith frame image of the environmental scene sub-video; m represents the number of pixel points of each line of each frame of image of the environmental scene sub-video; n represents the number of each row of pixel points of each frame of image of the environment scene sub-video;the value of i is from 1 to n, and the value of j is from 1 to m to obtain the minimum value in brackets; g represents the total frame number of the environment scene sub-video;the value of a is taken from 1 to G to obtain the minimum value in brackets; (ii) a
Step S302, using the following formula (2), performing fidelity compression processing on the environment scene sub-video according to the fidelity compression pixel value,
in the above-mentioned formula (2),pixel data (data in a pixel matrix form) representing the a-th frame image after fidelity compression is carried out on the environment scene sub video;the value of i is from 1 to n, and the value of j is from 1 to m and is substituted into brackets for all calculation;
step S303, using the following formula (3), according to the compressed sub-video data of the environment situation, judging whether the compression is effective compression, and controlling whether the compressed data needs to be restored,
in the above formula (3), Y represents a reduction control value of data; h () represents the amount of data for which the data in parentheses is obtained;
if Y is 1, the sub-video of the environment scene after the fidelity compression needs to be restored;
if Y is 0, it indicates that the environment scene sub-video after the fidelity compression does not need to be restored.
Further, in step S4, the picture splicing is performed on all the environment scene sub-videos according to the shooting direction of each environment scene sub-video, so as to obtain the corresponding environment panoramic scene video specifically includes:
and carrying out picture seamless splicing on all the environment scene sub-videos according to the shooting direction of each environment scene sub-video and the shooting time axis of each environment scene sub-video, thereby obtaining the corresponding environment panoramic scene video.
Compared with the prior art, the automatic video abstract generation method based on deep learning synchronously shoots different azimuth areas of the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the environmental scene sub-video to obtain semantic labels of different objects appearing in the environmental scene sub-video, and then forming a video content abstract in a preset picture of the environmental scene sub-video; and finally, according to the shooting direction of each environment scene sub-video, performing picture splicing on all environment scene sub-videos to obtain corresponding environment panoramic scene videos, so that the environment scene sub-videos shot by different cameras can be synchronously identified and analyzed, objects in the environment scene sub-videos are calibrated, and matched video content summaries are generated, so that the videos are comprehensively and accurately screened and identified, and the automation and the intelligent degree of video identification processing are improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow diagram of an automatic video summary generation method based on deep learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of an automatic video summary generation method based on deep learning according to an embodiment of the present invention. The automatic video abstract generation method based on deep learning comprises the following steps:
step S1, synchronously shooting different azimuth areas of the same environmental occasion through a plurality of cameras respectively, and acquiring a plurality of environmental occasion sub-videos; according to the shooting direction of the environment scene sub-video, all the environment scene sub-video is stored in a block chain in a grouping mode;
step S2, extracting corresponding environment situation sub-videos from the block chain according to the video acquisition request from the video processing terminal, and transmitting the sub-videos to the video processing terminal; then, the sub-video of the environment situation is identified, so that semantic labels of different objects appearing in the sub-video of the environment situation are obtained;
step S3, forming a video content abstract in the preset picture of the environment scene sub-video according to the semantic label; then, performing data compression processing on the sub-video in the environment situation;
and step S4, splicing all the environment scene sub-videos according to the shooting direction of each environment scene sub-video, thereby obtaining the corresponding environment panoramic scene video.
The beneficial effects of the above technical scheme are: the automatic video abstract generating method based on deep learning synchronously shoots different azimuth areas of the same environment occasion to obtain a plurality of environment occasion sub-videos; identifying the sub-video of the environment scene to obtain semantic labels of different objects appearing in the sub-video of the environment scene, and then forming a video content abstract in a preset picture of the sub-video of the environment scene; and finally, according to the shooting direction of each environment scene sub-video, performing picture splicing on all environment scene sub-videos to obtain corresponding environment panoramic scene videos, so that the environment scene sub-videos shot by different cameras can be synchronously identified and analyzed, objects in the environment scene sub-videos are calibrated, and matched video content summaries are generated, so that the videos are comprehensively and accurately screened and identified, and the automation and the intelligent degree of video identification processing are improved.
Preferably, in step S1, the step of synchronously shooting different azimuth areas of the same environmental situation by using a plurality of cameras respectively, so that acquiring sub-videos of a plurality of environmental situations specifically includes:
the shooting directions of a plurality of cameras are respectively aligned to different azimuth areas along the circumferential direction of the same environment occasion, and the shooting view angle of each camera is adjusted at the same time, so that the whole shooting view angle of all the cameras can completely cover the whole circumferential azimuth area of the environment occasion;
and then all the cameras are instructed to synchronously shoot at the same focal length, so that a plurality of sub-videos of the environment and the occasion are acquired.
The beneficial effects of the above technical scheme are: the plurality of cameras are arranged to be respectively aligned to different azimuth areas of the same environment occasion along the circumferential direction, so that each camera can independently shoot videos of the corresponding azimuth area, and accordingly, panoramic shooting without dead angles is conducted on the environment occasion and the video shooting real-time performance of the environment occasion is improved. In addition, all cameras are indicated to synchronously shoot with the same focal length, so that the environment scene sub-videos shot by all the cameras can be guaranteed to have the same focal depth range, and splicing and integration can be conveniently and rapidly carried out on the environment scene sub-videos in different follow-up modes.
Preferably, in step S1, the storing all the environment scene sub-video groups into the blockchain according to the shooting orientation of the environment scene sub-video specifically includes:
acquiring shooting azimuth information of each camera, and adding the shooting azimuth information serving as video index information into a corresponding environment and situation sub-video; all the environmental context sub-video packets are then stored in the blockchain.
The beneficial effects of the above technical scheme are: shooting directions of different cameras for video shooting are different, and the shooting direction information of each camera is used as video index information to be added into the corresponding environment scene sub-video, so that the required environment scene sub-video can be quickly and accurately searched in a block chain in the follow-up process.
Preferably, in step S2, the extracting, according to the video obtaining request from the video processing terminal, the corresponding environment situation sub-video from the block chain, and transmitting the corresponding environment situation sub-video to the video processing terminal specifically includes:
extracting corresponding video shooting time range conditions from a video acquisition request from a video processing terminal, and extracting an environment situation sub-video matched with the video shooting time range from the block chain; and synchronously transmitting all the sub-videos of the environment occasions obtained by extraction to the video processing terminal.
The beneficial effects of the above technical scheme are: in practical applications, the video processing terminal may be, but is not limited to, a computer having an image processing function. The video processing terminal sends a video acquisition request to the block chain, and then the block chain obtains the environmental scene sub-video shot in the corresponding time range according to the video shooting time range condition in the video acquisition request, so that the environmental scene sub-video can be conveniently identified in different time periods.
Preferably, in step S2, the identifying process is performed on the environmental situation sub-video, so as to obtain semantic tags of different objects appearing in the environmental situation sub-video, which specifically includes:
decomposing the environment scene sub-video into a plurality of environment scene picture frames according to the time axis sequence of the video stream of the environment scene sub-video;
identifying each environment occasion picture frame so as to obtain identity attribute information and action attribute information of different objects initially selected by the environment occasion picture frame;
and generating an identity attribute semantic tag and an action attribute semantic tag about the object according to the identity attribute information and the action attribute information.
The beneficial effects of the above technical scheme are: according to the sequence of the time axis of the video stream of the environment scene sub-video, the environment scene sub-video is decomposed into a plurality of environment scene picture frames, so that the environment scene sub-video can be subjected to detailed identification processing. Specifically, the face recognition and the limb movement recognition are performed on the human object existing in the picture frame in each environment, and the identity attribute information and the movement attribute information of the human object are obtained. And then generating a label in a semantic character form according to the identity attribute information and the action attribute information, so that the real-time dynamic situation of the character object in the environment occasion can be represented in a text mode.
Preferably, in step S3, forming a video content summary in the preset frame of the environmental situation sub-video according to the semantic tag specifically includes:
generating a text abstract related to the identity state and the action state of the object according to the identity attribute semantic label and the action attribute semantic label;
selecting a preset abstract adding picture area in the picture frame of the environment occasion where the object appears, wherein the abstract adding picture area is not overlapped with the picture area where the object appears in the picture frame of the environment occasion;
and after the text abstract is added to the abstract adding picture area, performing self-font amplification display on the text abstract.
The beneficial effects of the above technical scheme are: and carrying out adaptive character combination on the identity attribute semantic label and the action attribute semantic label to obtain a character abstract about the identity state and the action state of the object, so that the real-time dynamic condition of the character object can be accurately and timely known by reading the character abstract. And then selecting a preset abstract adding picture area in the picture frame of the environment occasion where the character object appears, and adding the character abstract into the abstract adding picture area, so that the real-time dynamic situation of the character object can be simultaneously obtained in the process of watching the sub-video of the environment occasion, and the visual watching performance of the sub-video of the environment occasion is improved.
Preferably, in step S3, the data compression processing on the environment scene sub-video specifically includes:
and according to the sequence of the time axis of the video stream of the environment scene sub-video, recombining all the environment scene picture frames in sequence to obtain the environment scene sub-video, and then performing fidelity compression processing on the environment scene sub-video.
The beneficial effects of the above technical scheme are: and according to the sequence of the time axis of the video stream of the environment scene sub-video, recombining all the image frames of the environment scene in sequence to obtain the environment scene sub-video, so that each image frame contained in the environment scene sub-video can display the real-time dynamic condition of the character object.
Preferably, in step S3, the performing fidelity compression processing on the environment scene sub-video specifically includes:
step S301, using the following formula (1), screening out the fidelity compressed pixel value of the video according to the environment situation sub-video,
in the above formula (1), l represents the fidelity compressed pixel value of the ambient scene sub-video; l is a radical of an alcohola(i, j) the pixel value of the ith row and jth column pixel point of the ith frame image of the sub-video in the environment and scene; m represents the number of pixel points of each line of each frame of image of the sub-video in the environment situation; n represents the number of each row of pixel points of each frame of image of the sub-video in the environment situation;the value of i is from 1 to n, and the value of j is from 1 to m to obtain the minimum value in brackets; g represents the total frame number of the environment scene sub-video;means that the value of a is from 1 to G to obtain the minimum value in bracketsA value; (ii) a
Step S302, using the following formula (2), performing fidelity compression processing on the environment scene sub-video according to the fidelity compression pixel value,
in the above-mentioned formula (2),pixel data (data in a pixel matrix form) representing the image of the a-th frame after fidelity compression is carried out on the sub-video in the environment and the scene;the value of i is from 1 to n, and the value of j is from 1 to m and is substituted into a bracket for all calculation;
step S303, using the following formula (3), according to the compressed sub-video data of the environment situation, determining whether the compression is effective compression, and controlling whether the compressed data needs to be restored,
in the above formula (3), Y represents a reduction control value of data; h () represents the amount of data for which the data in parentheses is obtained;
if Y is 1, the sub-video of the environment scene after the fidelity compression needs to be restored;
if Y is 0, it indicates that the environment scene sub-video after the fidelity compression does not need to be restored.
The beneficial effects of the above technical scheme are: screening out a fidelity compression pixel value of the video according to the environment situation sub-video by using the formula (1), and further storing the fidelity compression pixel value of the video together with the fidelity compression pixel value of the video after the fidelity compression of the video, so that the subsequent decompression processing is facilitated; then, the fidelity compression processing is carried out on the environmental scene sub-video according to the fidelity compression pixel value by using the formula (2), so that the fidelity compression processing is carried out quickly and efficiently, and the operating efficiency of the system is improved; and finally, judging whether the compression is effective compression or not according to the compressed environment scene sub-video data by using the formula (3), controlling whether the compressed data needs to be restored or not, and further restoring the ineffective compression to ensure the reliability of the video compression.
Preferably, in step S4, the picture splicing all the environmental scene sub-videos according to the shooting orientation of each environmental scene sub-video, so as to obtain the corresponding environmental panoramic scene video specifically includes:
and carrying out picture seamless splicing on all the environment scene sub-videos according to the shooting direction of each environment scene sub-video and the shooting time axis of each environment scene sub-video, thereby obtaining the corresponding environment panoramic scene video.
The beneficial effects of the above technical scheme are: and performing picture seamless splicing on all the environment scene sub-videos according to the shooting direction of each environment scene sub-video and the shooting time axis of each environment scene sub-video, so that the obtained environment panoramic scene video can comprehensively and really reflect the real-time dynamic condition of the character object in the environment scene overall situation.
According to the content of the embodiment, the automatic video abstract generation method based on deep learning synchronously shoots different azimuth areas of the same environment occasion to obtain a plurality of sub-videos of the environment occasion; identifying the environmental scene sub-video to obtain semantic labels of different objects appearing in the environmental scene sub-video, and then forming a video content abstract in a preset picture of the environmental scene sub-video; and finally, according to the shooting direction of each environment scene sub-video, performing picture splicing on all environment scene sub-videos to obtain corresponding environment panoramic scene videos, so that the environment scene sub-videos shot by different cameras can be synchronously identified and analyzed, objects in the environment scene sub-videos are calibrated, and matched video content summaries are generated, so that the videos are comprehensively and accurately screened and identified, and the automation and the intelligent degree of video identification processing are improved.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (9)
1. The automatic video abstract generation method based on deep learning is characterized by comprising the following steps:
step S1, synchronously shooting different azimuth areas of the same environmental occasion through a plurality of cameras respectively, and acquiring a plurality of environmental occasion sub-videos; according to the shooting direction of the environment scene sub-videos, all the environment scene sub-videos are stored in a block chain in a grouping mode;
step S2, extracting corresponding environment situation sub-videos from the block chain according to a video acquisition request from a video processing terminal, and transmitting the sub-videos to the video processing terminal; then, the sub-videos of the environment situation are identified, so that semantic labels of different objects appearing in the sub-videos of the environment situation are obtained;
step S3, forming a video content abstract in a preset picture of the environment scene sub-video according to the semantic label; then, performing data compression processing on the sub-video in the environment situation;
and step S4, splicing all the environment scene sub-videos according to the shooting direction of each environment scene sub-video, thereby obtaining the corresponding environment panoramic scene video.
2. The automated video summary generation method based on deep learning according to claim 1, characterized in that:
in step S1, the cameras respectively perform synchronous shooting on different azimuth areas of the same environment, so as to acquire sub-videos of a plurality of environment occasions, which specifically includes:
the method comprises the following steps of respectively aligning the shooting directions of a plurality of cameras to different azimuth areas along the circumferential direction of the same environment occasion, and simultaneously adjusting the shooting visual angle of each camera, so that the whole shooting visual angles of all the cameras can completely cover the whole circumferential azimuth area of the environment occasion;
and then all the cameras are instructed to synchronously shoot at the same focal length, so that a plurality of sub-videos of the environment and the occasion are acquired.
3. The automated video summary generation method based on deep learning according to claim 2, characterized in that:
in step S1, the storing all the environment scene sub-video groups into a block chain according to the shooting orientations of the environment scene sub-videos specifically includes:
acquiring shooting azimuth information of each camera, and adding the shooting azimuth information serving as video index information into a corresponding environment and situation sub-video; all the environmental context sub-video packets are then stored in the blockchain.
4. The automated video summary generation method based on deep learning according to claim 3, characterized in that:
in step S2, according to a video acquisition request from a video processing terminal, extracting a corresponding environment situation sub-video from the block chain, and transmitting the extracted environment situation sub-video to the video processing terminal specifically includes:
extracting corresponding video shooting time range conditions from a video acquisition request from a video processing terminal, and extracting an environment situation sub-video matched with the video shooting time range from the block chain; and synchronously transmitting all the sub-videos of the environment occasions obtained by extraction to the video processing terminal.
5. The automated video summary generation method based on deep learning according to claim 4, characterized in that:
in step S2, the identifying the environmental situation sub-video, so as to obtain semantic tags of different objects appearing in the environmental situation sub-video specifically includes:
decomposing the environment scene sub-video into a plurality of environment scene picture frames according to the sequence of the time axis of the video stream of the environment scene sub-video;
identifying each environment occasion picture frame so as to obtain identity attribute information and action attribute information of different objects initially selected by the environment occasion picture frame;
and generating an identity attribute semantic label and an action attribute semantic label related to the object according to the identity attribute information and the action attribute information.
6. The automated video summary generation method based on deep learning according to claim 5, characterized in that:
in step S3, forming a video content summary in the preset picture of the environmental situation sub-video according to the semantic tag specifically includes:
generating a text abstract of the identity state and the action state of the object according to the identity attribute semantic label and the action attribute semantic label;
selecting a preset summary adding picture area in the picture frame of the environment occasion where the object appears, wherein the summary adding picture area is not overlapped with the picture area where the object appears in the picture frame of the environment occasion;
and after the text abstract is added to the abstract adding picture area, performing self-font amplification display on the text abstract.
7. The automated video summary generation method based on deep learning according to claim 6, characterized in that:
in step S3, the data compression processing on the environment scene sub-video specifically includes:
and according to the sequence of the time axis of the video stream of the environment scene sub-video, recombining all the image frames of the environment scene in sequence to obtain the environment scene sub-video, and then carrying out fidelity compression processing on the environment scene sub-video.
8. The automated video summary generation method based on deep learning of claim 7, characterized in that:
in step S3, the performing fidelity compression processing on the environment scene sub-video specifically includes:
step S301, screening out fidelity compressed pixel values of the video according to the environment situation sub-video by using the following formula (1),
in the above formula (1), l represents a fidelity compressed pixel value of the ambient scene sub-video; l isa(i, j) the pixel value of the ith row and jth column pixel point of the ith frame image of the sub-video in the environmental scene is represented; m represents the number of pixel points of each line of each frame of image of the environmental scene sub-video; n represents the number of each row of pixel points of each frame of image of the environment scene sub-video;the value of i is from 1 to n, and the value of j is from 1 to m to obtain the minimum value in brackets; g represents the total frame number of the environment scene sub-video;the value of a is taken from 1 to G to obtain the minimum value in brackets; (ii) a
Step S302, using the following formula (2), according to the fidelity compression pixel value, the environment scene sub-video is processed with fidelity compression,
in the above-mentioned formula (2),pixel data (data in a pixel matrix form) representing the a-th frame image after fidelity compression is carried out on the environment scene sub video;the value of i is from 1 to n, and the value of j is from 1 to m and is substituted into a bracket for all calculation;
step S303, using the following formula (3), according to the compressed sub-video data of the environment situation, judging whether the compression is effective compression, and controlling whether the compressed data needs to be restored,
in the above formula (3), Y represents a reduction control value of data; h () represents the amount of data for which the data in parentheses is obtained;
if Y is equal to 1, the sub-video of the environment scene after the fidelity compression needs to be restored;
if Y is 0, it indicates that the environment scene sub-video after the fidelity compression does not need to be restored.
9. The automated video summary generation method based on deep learning of claim 7, characterized in that:
in step S4, the picture splicing is performed on all the environment scene sub-videos according to the shooting direction of each environment scene sub-video, so as to obtain the corresponding environment panoramic scene video specifically includes:
and performing picture seamless splicing on all the environment scene sub-videos according to the shooting direction of each environment scene sub-video and the shooting time axis of each environment scene sub-video, thereby obtaining the corresponding environment panoramic scene video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337196.8A CN114710713B (en) | 2022-03-31 | 2022-03-31 | Automatic video abstract generation method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337196.8A CN114710713B (en) | 2022-03-31 | 2022-03-31 | Automatic video abstract generation method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114710713A true CN114710713A (en) | 2022-07-05 |
CN114710713B CN114710713B (en) | 2023-08-01 |
Family
ID=82170441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210337196.8A Active CN114710713B (en) | 2022-03-31 | 2022-03-31 | Automatic video abstract generation method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114710713B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105100688A (en) * | 2014-05-12 | 2015-11-25 | 索尼公司 | Image processing method, image processing device and monitoring system |
US10230866B1 (en) * | 2015-09-30 | 2019-03-12 | Amazon Technologies, Inc. | Video ingestion and clip creation |
WO2019185170A1 (en) * | 2018-03-30 | 2019-10-03 | Toyota Motor Europe | Electronic device, robotic system and method for localizing a robotic system |
CN112015231A (en) * | 2020-07-31 | 2020-12-01 | 中标慧安信息技术股份有限公司 | Method and system for processing surveillance video partition |
CN112052841A (en) * | 2020-10-12 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Video abstract generation method and related device |
CN113052753A (en) * | 2019-12-26 | 2021-06-29 | 百度在线网络技术(北京)有限公司 | Panoramic topological structure generation method, device, equipment and readable storage medium |
WO2022033252A1 (en) * | 2020-08-14 | 2022-02-17 | 支付宝(杭州)信息技术有限公司 | Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus |
-
2022
- 2022-03-31 CN CN202210337196.8A patent/CN114710713B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105100688A (en) * | 2014-05-12 | 2015-11-25 | 索尼公司 | Image processing method, image processing device and monitoring system |
US10230866B1 (en) * | 2015-09-30 | 2019-03-12 | Amazon Technologies, Inc. | Video ingestion and clip creation |
WO2019185170A1 (en) * | 2018-03-30 | 2019-10-03 | Toyota Motor Europe | Electronic device, robotic system and method for localizing a robotic system |
CN113052753A (en) * | 2019-12-26 | 2021-06-29 | 百度在线网络技术(北京)有限公司 | Panoramic topological structure generation method, device, equipment and readable storage medium |
CN112015231A (en) * | 2020-07-31 | 2020-12-01 | 中标慧安信息技术股份有限公司 | Method and system for processing surveillance video partition |
WO2022033252A1 (en) * | 2020-08-14 | 2022-02-17 | 支付宝(杭州)信息技术有限公司 | Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus |
CN112052841A (en) * | 2020-10-12 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Video abstract generation method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN114710713B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084165B (en) | Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation | |
WO2021042682A1 (en) | Method, apparatus and system for recognizing transformer substation foreign mattter, and electronic device and storage medium | |
WO2019100608A1 (en) | Video capturing device, face recognition method, system, and computer-readable storage medium | |
CN103475882B (en) | Surveillance video encoding and recognizing method and surveillance video encoding and recognizing system | |
US20110115917A1 (en) | Surveillance system and surveilling method | |
CN112767711B (en) | Multi-class multi-scale multi-target snapshot method and system | |
CN110458198B (en) | Multi-resolution target identification method and device | |
CN112634561A (en) | Safety alarm method and system based on image recognition | |
CN110096945B (en) | Indoor monitoring video key frame real-time extraction method based on machine learning | |
CN112949439A (en) | Method and system for monitoring invasion of personnel in key area of oil tank truck | |
CN111860457A (en) | Fighting behavior recognition early warning method and recognition early warning system thereof | |
CN112637564B (en) | Indoor security method and system based on multi-picture monitoring | |
CN116962598B (en) | Monitoring video information fusion method and system | |
CN113255549A (en) | Intelligent recognition method and system for pennisseum hunting behavior state | |
CN111160123B (en) | Aircraft target identification method, device and storage medium | |
CN114764895A (en) | Abnormal behavior detection device and method | |
CN114710713B (en) | Automatic video abstract generation method based on deep learning | |
CN116343120A (en) | Method for realizing unmanned on duty of radar station based on image recognition of deep learning | |
CN116246200A (en) | Screen display information candid photographing detection method and system based on visual identification | |
CN112990156B (en) | Optimal target capturing method and device based on video and related equipment | |
CN112733809B (en) | Intelligent image identification method and system for natural protection area monitoring system | |
CN114841932A (en) | Foreign matter detection method, system, equipment and medium for photovoltaic panel of photovoltaic power station | |
CN114782860A (en) | Violent behavior detection system and method in monitoring video | |
CN116129157B (en) | Intelligent image processing method and system for warning camera based on extreme low light level | |
CN117671598A (en) | Video monitoring method and system based on target detection and image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |