CN112101075A

CN112101075A - Information implantation area identification method and device, storage medium and electronic equipment

Info

Publication number: CN112101075A
Application number: CN201910528228.0A
Authority: CN
Inventors: 高琛琼; 谢年华; 殷泽龙; 肖泽东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-18
Filing date: 2019-06-18
Publication date: 2020-12-18
Anticipated expiration: 2039-06-18
Also published as: CN112101075B

Abstract

The invention provides a method and a device for identifying an information implantation area, a computer storage medium and electronic equipment. The method comprises the following steps: acquiring a video to be processed, and segmenting the video to be processed to acquire a plurality of video segments; determining an image frame to be identified from each video fragment, and classifying pixels in the image frame to be identified to obtain the pixel type in the image frame to be identified; determining whether a target object exists in the image frame to be identified according to the pixel type; when the target object exists in the image frame to be identified, acquiring a frame sequence section containing the target object in the video fragment according to a preset rule, and taking the position of the target object in the frame sequence section as an information implantation area. The invention can automatically determine which frame sequence segments and which areas in the video can be implanted with information, thereby avoiding manual screening and marking and improving the video advertisement implantation efficiency; and the persistence of video advertisement implantation is ensured.

Description

Information implantation area identification method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying an information implantation area, a computer storage medium, and an electronic device.

Background

With the gradual maturity of information electronization, advertisements gradually evolve from initial printed advertisements to electronic media advertisements, and in order to improve the user reach rate of products, advertisers can implant their own product advertisements into television videos, so that users watching the television videos can see the implanted product advertisements to acquire advertisement information, taking television advertisements as an example.

At present, a novel technology for implanting advertisements into videos is a method for soft implanting advertisements into videos, specifically, after videos are obtained, advertisements are implanted by manually screening and marking advertisement positions, due to diversity and uncertainty of video contents, a large amount of manpower is consumed by manually identifying implantable regions and implantation time in the videos, and production efficiency and yield are seriously affected.

In view of the above, there is a need in the art to develop a new method for identifying information-embedded regions.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

Embodiments of the present invention provide an information embedding area identification method, an information embedding area identification device, a computer storage medium, and an electronic device, which can reduce time consumed by manual screening and labeling, reduce labor cost, and improve production efficiency and yield to at least a certain extent.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to an aspect of an embodiment of the present invention, there is provided an information implantation area identification method including: acquiring a video to be processed, and segmenting the video to be processed to acquire a plurality of video segments; determining an image frame to be identified from each video fragment, and classifying pixels in the image frame to be identified to obtain the pixel type in the image frame to be identified; determining whether a target object exists in the image frame to be identified according to the pixel type; when the target object exists in the image frame to be identified, acquiring a frame sequence section containing the target object in the video fragment according to a preset rule, and taking the position of the target object in the frame sequence section as an information implantation area.

According to an aspect of an embodiment of the present invention, there is provided an information implantation area identification apparatus including: the video segmentation module is used for acquiring a video to be processed and segmenting the video to be processed to acquire a plurality of video segments; the pixel classification module is used for determining an image frame to be identified from each video fragment and classifying pixels in the image frame to be identified so as to acquire the pixel type in the image frame to be identified; the object judging module is used for determining whether a target object exists in the image frame to be identified according to the pixel type; and the region determining module is used for acquiring a frame sequence segment containing the target object in the video fragment according to a preset rule when the target object exists in the image frame to be identified, and implanting the position of the target object in the frame sequence segment into a region as information.

In some embodiments of the present invention, based on the foregoing scheme, the video segmentation module includes: the characteristic extraction unit is used for extracting target characteristics from the video to be processed; and the similarity identification unit is used for carrying out similarity algorithm identification on adjacent image frames and segmenting the video to be processed according to an identification result so as to obtain the plurality of video fragments.

In some embodiments of the invention, based on the foregoing scheme, the similarity comparison unit is configured to: calculating the distance between target features in the adjacent image frames, and performing similarity algorithm identification according to the distance; when the distance is smaller than a preset distance threshold value, the adjacent image frames belong to the same video fragment; and when the distance is greater than or equal to the preset distance threshold, the adjacent image frames belong to different video slices.

In some embodiments of the invention, the video slices are still video slices; based on the foregoing, the pixel classification module is configured to: and taking any frame in the still video fragment as the image frame to be identified.

In some embodiments of the invention, the video slices are motion video slices; based on the foregoing, the pixel classification module is configured to: taking a first frame in the video fragment as a starting frame, and calculating the motion amount of pixels in adjacent image frames; and if the motion amount of the pixels in the first target image frame is larger than or equal to a first preset motion amount threshold value, taking the starting frame and the first target image frame as the image frame to be identified.

In some embodiments of the present invention, based on the foregoing scheme, the pixel classification module includes: and the pixel classification unit is used for inputting the image frame to be identified into a depth encoder decoder network model, encoding and decoding the image frame to be identified through the depth encoder decoder network model so as to classify pixels in the image frame to be identified and acquire the pixel type in the image frame to be identified.

In some embodiments of the invention, the depth coder decoder network model comprises a cascaded encoder and decoder; based on the foregoing scheme, the pixel classification unit is configured to: performing convolution from a lower layer to a higher layer on the image frame to be identified through the encoder to obtain a plurality of characteristics; integrating the plurality of features to form a coded feature; and upsampling the coding features through the decoder to acquire pixel classification information with the same size as the image frame to be identified, wherein the pixel classification information comprises a pixel type.

In some embodiments of the present invention, based on the foregoing solution, the object determination module is configured to: determining an object to be processed according to the pixel type of the pixels in the image frame to be recognized; acquiring the area ratio of the object to be processed in the image frame to be identified; comparing the area ratio with a preset ratio threshold value, and determining the target object according to a comparison result; and if the area occupation ratio corresponding to the target object to be processed is larger than the preset occupation ratio threshold, taking the target object to be processed as the target object.

In some embodiments of the present invention, based on the foregoing solution, the pixel types include a first pixel type, a second pixel type, and a third pixel type, where a target object corresponding to the first pixel type is used to provide an implantation carrier for information to be implanted; the target object corresponding to the second pixel type is used for providing an implantation plane for information to be implanted; and the target object corresponding to the third pixel type is an invalid implantation area.

In some embodiments of the invention, the video slices are still video slices; based on the foregoing, the region determining module is configured to: taking a first frame containing the target object as a starting frame and taking a last frame of the video fragment as an ending frame; and sequencing all frames among the starting frame, the starting frame and the ending frame in time to form a frame sequence section, and implanting information to be implanted into the position where the target object is located in the frame sequence section.

In some embodiments of the present invention, based on the foregoing solution, the region determining module includes: the motion amount calculation unit is used for calculating the motion amount of pixels in a preset image frame relative to pixels in an initial frame by taking all image frames to be identified containing the target object as the initial frame, and the frame number of the preset image frame and the frame number of the initial frame are separated by a preset value; the selection unit is used for taking the preset image frame as a second target image frame when the motion amount meets a preset condition; the acquisition unit is used for taking the second target image frame as a starting frame and repeating the steps to acquire all second target image frames meeting the preset condition; the sequencing unit is used for sequencing the image frames containing the target object in the second target image frame according to time to form a frame sequence and acquiring the duration of the frame sequence; and the frame sequence segment generating unit is used for taking the frame sequence or the subsequence of the frame sequence as the frame sequence segment and implanting information to be implanted into the position where the target object in the frame sequence segment is located when the duration is greater than or equal to a preset time threshold.

In some embodiments of the present invention, based on the foregoing solution, the selecting unit is configured to: comparing the amount of motion to the first and second preset threshold amounts of motion; and when the motion amount is greater than or equal to the second preset motion amount threshold and less than the first preset motion amount threshold, the preset image frame is the target image frame.

According to an aspect of the embodiments of the present invention, there is provided a computer storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the method for identifying an information implantation region as described in the above embodiments

According to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of identifying an information implantation area as described in the above embodiments.

In the technical scheme provided by some embodiments of the present invention, a shot segmentation is performed on an acquired video to be processed to acquire a plurality of video slices, then an image frame to be identified is determined from each video slice, then pixels in the image frame to be identified are classified to acquire a pixel type in the image frame to be identified, and then whether a target object exists in the image frame to be identified is determined according to the pixel type; when the target object exists in the image frame to be identified, acquiring a frame sequence segment containing the target object in each video fragment according to a preset rule, wherein the position of the target object in the frame sequence segment is an information implantation area. According to the technical scheme, whether the information implantation area exists in the video can be automatically determined, and which frame sections in the video can be implanted with information can be automatically determined, so that manual screening and marking are avoided, labor cost is reduced, and video advertisement implantation efficiency is improved; on the other hand, the information implantation areas in different types of video fragments can be identified according to different identification methods, and the continuity of video advertisement implantation is ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the invention may be applied;

FIG. 2 schematically illustrates a flow chart of a method of identifying an information implantation region according to one embodiment of the invention;

FIG. 3 schematically illustrates a flow diagram for shot slicing according to one embodiment of the invention;

FIG. 4 is a schematic diagram illustrating a similarity comparison process according to an embodiment of the present invention;

FIG. 5 schematically illustrates a flow diagram for determining an image frame to be identified according to one embodiment of the invention;

6A-6B schematically illustrate an interface schematic of a flower-graft business opportunity according to one embodiment of the present invention;

7A-7B schematically illustrate an interface schematic without a neutral business opportunity according to one embodiment of the invention;

FIG. 8 is a schematic diagram illustrating a target object determination flow according to an embodiment of the invention;

FIG. 9 schematically shows a flowchart for obtaining a frame sequence segment according to an embodiment of the invention;

fig. 10 schematically shows a block diagram of an identification apparatus of an information implantation area according to an embodiment of the present invention;

FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present invention can be applied.

As shown in fig. 1, the system architecture 100 may include terminal devices (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, and of course, a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminals, networks, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks and servers, according to practical needs. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

In an embodiment of the present invention, after obtaining a video to be processed, a terminal device 101, which may also be

terminal devices

102 and 103, may send image data corresponding to the video to be processed to a server 105 through a network 104, and the server 105 performs shot segmentation on the video to be processed according to a similarity between any two adjacent frames in the video to be processed, so as to obtain a plurality of video segments; then screening the image frames in each video fragment to determine the image frames to be identified; then, the server 105 may process the image frame to be recognized through the machine learning model to classify pixels in the image frame to be recognized, obtain a pixel type in the image frame to be recognized, and determine whether a target object exists in the image frame to be recognized according to the pixel type, where the target object is an object that can be used for information implantation and also becomes a business opportunity, such as a billboard, a desktop, and the like; and finally, after the target object exists in the image frame to be identified, acquiring a frame sequence segment containing the target object in each video fragment according to a preset rule, and taking the position of the target object in the frame sequence segment as an information implantation area to implant the video advertisement. According to the technical scheme of the embodiment of the invention, on one hand, manual screening and marking of the information implantation area can be avoided, automatic identification of the information implantation area is realized, the labor cost is reduced, and the implantation efficiency is improved; on the other hand, the frame sequence segment with the information implantation area in the video fragment can be obtained, and the information implantation continuity is ensured.

It should be noted that the method for identifying an information implantation area provided by the embodiment of the present invention is generally executed by the server 105, and accordingly, the apparatus for identifying an information implantation area is generally disposed in the server 105. However, in other embodiments of the present invention, the terminal may also have a similar function as the server, so as to execute the identification scheme of the information implantation area provided by the embodiment of the present invention.

In the related art in this field, taking advertisement placement as an example, there are three methods for identifying advertisement placement business opportunities in a video: (1) color texture clustering, extracting the color and texture characteristics of all pixels in an image frame through a manually set clustering number threshold, classifying the pixels by using a clustering algorithm to obtain the categories of different pixels, and determining whether business opportunities exist or not; (2) edge detection, namely using an edge detection algorithm to obtain a position with a quadrilateral edge in an image frame as a business opportunity; (3) and (4) manually selecting, manually watching all videos and determining the time position of the implantable frame.

However, the above method for identifying business opportunities has the corresponding disadvantages: the color texture clustering needs to manually select a category threshold value, cannot be used for all scenes, and has low distinguishing degree of the sky and the ground with detailed color textures, low automation degree and frequent adjustment; the edge detection is greatly influenced by illumination, the contour extraction effect is unstable, and the quadrilateral contour is not necessarily an implantable commercial aircraft carrier, so that the accuracy is low; manual selection consumes long time, and seriously influences the production efficiency and the yield.

In view of the problems in the related art, the embodiments of the present invention first provide a method for identifying an information implantation area, where the method for identifying an information implantation area in the embodiments of the present invention can be used for video advertisement implantation and the like, and details of implementation of the technical solution in the embodiments of the present invention are described in detail below by taking video advertisement implantation as an example:

fig. 2 schematically shows a flowchart of an identification method of an information implantation area according to an embodiment of the present invention, which may be performed by a server, which may be the server shown in fig. 1. Referring to fig. 2, the method for identifying an information-embedded area at least includes steps S210 to S240, which are described in detail as follows:

in step S210, a video to be processed is obtained, and the video to be processed is segmented to obtain a plurality of video segments.

In an embodiment of the present invention, video advertisement implantation is a novel technical system for intelligently implanting advertisements in a finished video by using a computer vision technology, and a user may obtain a video to be processed in a manner of searching for a video online, or may obtain a video to be processed from a video folder or a video database of the terminal device 101, where the video to be processed may be a video file of any format, such as avi, mp4, rmvb, and the like, which is not specifically limited in the embodiment of the present invention. After determining the video to be processed, the terminal device 101 may send the video to be processed to the server 105, and the server 105 may process the video to be processed to identify the information implantation area therein.

In one embodiment of the invention, the basic structure of the video is a hierarchical structure composed of frames, shots, scenes and video programs, wherein the frames are static images and are the minimum logic units for forming the video, and a continuous frame sequence in time is continuously played at equal intervals to form a dynamic video; the shot is a frame sequence continuously shot from startup to shutdown of a camera, depicts a part of an event or a scene, has no or weak semantic information, and emphasizes the similarity of visual contents forming the frame; the scene is a semantic-related continuous shot which can be shot by the same object at different angles and with different techniques, or a shot combination with the same theme and event, so that the semantic relevance is emphasized; the video program contains a complete event or story, which is a video content structure at the highest layer and comprises the composition relation of the video, abstract, semantic and general description of the video, and the like. In order to effectively identify the information embedding area, each event or scene in the video can be used as an object to identify the information embedding area, so that after the video to be processed is obtained, the video to be processed can be segmented into a plurality of video segments corresponding to different events or scenes, the video segments are shots in a video composition structure, and specifically, after the video to be processed is obtained, the shot segmentation can be performed on the video to be processed so as to segment the video to be processed into a plurality of video segments.

Fig. 3 is a schematic flowchart illustrating a cut-through process, and as shown in fig. 3, in step S301, a target feature is extracted from a video to be processed; in step S302, similarity algorithm recognition is performed on target features in adjacent image frames, and the video to be processed is segmented according to the recognition result, so as to obtain a plurality of video segments. During similarity algorithm identification, each pixel in two adjacent image frames can be subjected to similarity comparison, however, due to the fact that the number of pixels in an image is large, if the similarity of the pixels is compared one by one, a large amount of resources are occupied, and data processing efficiency is low, target features can be extracted from a video to be processed, the target features can be multi-bit features in the image frames contained in the video to be processed, and boundary image frames of adjacent video fragments are determined by performing similarity algorithm identification on the target features in the adjacent image frames.

In an embodiment of the present invention, in step S302, similarity algorithm identification is performed on target features in adjacent image frames, and a to-be-processed video is segmented according to an identification result to obtain a plurality of video segments, which may be specifically implemented according to a schematic flow diagram of similarity algorithm identification shown in fig. 4, as shown in fig. 4, the flow of similarity algorithm identification mainly includes steps S401 to S403, specifically:

in step S401, calculating the distance between target features in adjacent image frames, and performing similarity algorithm identification according to the distance;

in one embodiment of the present invention, in performing similarity algorithm identification, the similarity may be determined by calculating a distance between target features in two adjacent image frames, which may be a euclidean distance, a cosine distance, or the like,taking the Euclidean distance as an example, two adjacent image frames respectively correspond to a time t and a time t plus Deltat, the position of the target feature A at the time t is (x1, y1), the position of the target feature A at the time t plus Deltat is (x2, y2), and the Euclidean distance between the target features in the adjacent image frames can be obtained according to the Euclidean distance calculation formula

And then comparing the Euclidean distance with a preset distance threshold value, judging the similarity between adjacent image frames, and further carrying out video segmentation according to the similarity.

In step S402, when the distance is smaller than the preset distance threshold, the adjacent image frames belong to the same video segment.

In step S403, when the distance is greater than or equal to the preset distance threshold, the adjacent image frames belong to different video slices.

In an embodiment of the present invention, if the euclidean distance between the target features in the adjacent image frames is greater than or equal to the preset distance threshold, it indicates that the image has a large change, so that the two image frames can be divided into different video slices respectively; if the Euclidean distance between the target features in the adjacent image frames is smaller than the preset distance threshold, the image is not greatly changed, and therefore the two image frames can be divided into the same video fragment.

In step S220, an image frame to be identified is determined from each of the video slices, and pixels in the image frame to be identified are classified to obtain a pixel type in the image frame to be identified.

In one embodiment of the invention, when video mining is performed, the mining efficiency can be improved by representing the video slices by typical and representative frames in the video slices, such as key frames or representative frames, so that after the video slices are segmented, one or more image frames to be identified can be determined from each video slice. In order to identify the region available for information implantation in the video to be processed, it is necessary to identify the object in the image frame to be identified, and in order to acquire the object in the image frame to be identified, it is first necessary to classify the pixels in the image frame to be identified to acquire the pixel type in the image frame to be identified.

In one embodiment of the present invention, the types of video slices are different, and the method for determining the image frame to be identified is also different. The types of the video fragments can be divided into static video fragments and moving video fragments, the static video fragments are used for fixedly shooting a camera lens, the moving video fragments are used for fixedly shooting the camera lens, and when the video fragments are the static video fragments, pictures in all the video fragments formed by cutting the camera lens are basically unchanged, so that any one frame in the video fragments can be selected as an image frame to be identified, and the first frame of the video fragments can be selected as the image frame to be identified for simplicity; when the video segment is a moving video segment, the object change in each frame of the image is large, so that one frame of the image cannot be simply selected as the image frame to be recognized.

Fig. 5 is a schematic flowchart illustrating a process of determining an image frame to be recognized, and as shown in fig. 5, in step S501, the motion amount of pixels in adjacent image frames is calculated with the first frame in a video slice as a starting frame; the method comprises the steps of obtaining the position of a pixel in any adjacent image frame in a video fragment, comparing the position of the pixel in the next image frame with the position of the pixel in the previous image frame, obtaining the motion amount of the next image frame relative to the pixel in the previous image frame, and further judging the change degree of the pixel in each frame of the video fragment according to the motion amount of the pixel. In step S502, if the motion amount of the pixel in the first target image frame is greater than or equal to the first preset motion amount threshold, taking the starting frame and the first target image frame as the image frame to be recognized; when the motion amount of the pixels in the adjacent image frames is smaller than a first preset motion amount threshold value, the change of the objects in the adjacent image frames is small; when the motion amount of a pixel in an adjacent image frame is greater than or equal to a first preset motion threshold, it is described that the change of an object in the adjacent image frame is large, and in order to determine which image frames in a video slice have an information embedding region (business opportunity), the image frame whose motion amount of the pixel is greater than or equal to the first preset motion amount threshold may be taken out as an image frame to be recognized, where the first preset motion amount threshold may be set according to actual needs, and in the embodiment of the present invention, the first preset motion amount threshold may be 10 pixels, which is not specifically limited in the embodiment of the present invention. In the embodiment of the present invention, the amount of motion of the pixel may be determined by an optical flow method, which is an instantaneous speed of motion of the pixel of a spatially moving object on the observation imaging plane, and specifically, the amount of motion of the pixel may be calculated by a sparse optical flow method.

In an embodiment of the present invention, pixels in the image frame to be recognized may be classified by a depth encoder/decoder network model to determine whether an information embedding region (target object) exists in the image frame to be recognized. The depth encoder decoder network model comprises an encoder and a decoder which are cascaded, wherein the encoder can perform encoding processing on an image frame to be identified to form encoding characteristics, and the decoder can decode the encoding characteristics output by the encoder to realize classification of pixels in the image frame to be identified. Specifically, an image frame to be identified can be input into a depth encoder/decoder network model, and the image frame to be identified is subjected to convolution from a low layer to a high layer through an encoder to obtain a plurality of characteristics; then integrating the plurality of features to form coding features; and finally, the encoder outputs the coding characteristics to a decoder, and the decoder performs up-sampling on the coding characteristics to acquire pixel classification information with the same size as the image frame to be identified, wherein the pixel classification information comprises the pixel type. The pixel type may be a specific value, such as the number 2 for ground, the number 10 for sky, etc.

In one embodiment of the present invention, there are 22 pixel types including wall, building, indoor floor, outdoor floor, dining table, desk, task, house, window, door, box, poster bulletin board, screen, car, post, computer, television, counter, stage, display, other foreground and other background, the 22 pixel types can be divided into three categories: a first pixel category, a second pixel category and a third pixel category, wherein the target object corresponding to the first pixel category is used for providing an implantation carrier for information to be implanted, the target object corresponding to the second pixel category is used for providing an implantation plane for information to be implanted, the target object corresponding to the third pixel category is an invalid implantation area, specifically, the implantation carrier can be, for example, a box, a poster billboard, a screen, a side surface of a vehicle, a computer screen, a television screen, a display screen, when an advertisement is implanted, the advertisement to be implanted can be filled in the implantation carrier, the type of advertisement implantation location is also called a graft machine, fig. 6A-6B show an interface schematic diagram of the graft machine, in fig. 6A, the poster billboard corresponding to a rectangular box is the implantation carrier, the advertisement to be implanted can be implanted into the poster billboard, as shown in fig. 6B. The implantation plane may be, for example, a wall surface, an indoor floor surface, an outdoor floor surface, a building, a dining table, an office table, a house, a window, people, a post, a counter, a stage, and when an advertisement is implanted, the advertisement to be implanted may be implanted into a portion of the implantation plane, this type of advertisement implantation position is also referred to as a "neutral business free" position, and fig. 7A-7B show an interface schematic view of a neutral business free position, as shown in fig. 7A, with a wall behind the host, and without any advertisement on the wall surface, and then the advertisement to be implanted may be implanted into the upper left corner of the wall surface, as shown in fig. 7B.

Before the image frames to be recognized are processed through the depth encoder decoder network model, a large number of image frames can be collected as training samples to train the depth encoder decoder network model, for example, the image frames in one or more short videos can be used as the training samples, the training samples are input into the depth encoder decoder network model to be trained, the pixel classification results output by the model are compared with the pixel classification results corresponding to the training samples to judge the stability of the model, and when the loss function of the model is minimum, the model can be judged to complete the training.

In step S230, it is determined whether a target object exists in the image frame to be recognized according to the pixel type.

In an embodiment of the present invention, after the pixel type in the image frame to be recognized is recognized through the depth encoder/decoder network model, an object in the image frame to be recognized may be determined according to the pixel type, and then it is determined whether a target object exists in the image frame to be recognized, where the target object is the flower-shifting wood-grafting business opportunity and the non-mesogenic business opportunity mentioned in step S220.

Fig. 8 is a schematic diagram illustrating a determination flow of a target object, and as shown in fig. 8, the determination flow of the target object mainly includes steps S801 to S803, specifically:

in step S801, an object to be processed is determined according to the pixel type of a pixel in an image frame to be recognized.

In an embodiment of the present invention, it may be determined which objects are contained in the image frame to be recognized, for example, which position is a person, which position is a table, and the like, according to the pixel type analyzed by the depth encoder/decoder network model, and thus the object to be processed in the image frame to be recognized may be determined according to the pixel type.

In step S802, an area ratio of the object to be processed in the image frame to be recognized is acquired.

In an embodiment of the present invention, the area ratio of each object to be processed in the image frame to be recognized may be determined according to the number of pixels of the same type and the total pixel amount included in the image frame to be recognized, and then whether the object to be processed is a target object may be determined according to the area ratio.

In step S803, if there is an area ratio corresponding to the target object to be processed that is greater than the preset ratio threshold, the target object to be processed is regarded as the target object.

In an embodiment of the present invention, a preset ratio threshold may be determined according to actual needs, and the area ratio corresponding to each object to be processed is compared with the preset ratio threshold, and when the area ratio of one or more target objects to be processed existing in the plurality of objects to be processed in the image frame to be recognized is greater than the preset ratio threshold, the target object to be processed may be used as a target object, and subsequent information implantation may be performed.

In an embodiment of the present invention, the objects to be processed are different, and the preset ratio threshold for comparison is also different, and when the objects to be processed are poster bulletin boards, computer screens, and other flower-grafting business machines, the preset ratio threshold may be set to 10%; when the object to be processed is a desktop, a ground, or the like, which has no business opportunity, the preset percentage threshold may be set to 25%, and of course, in the embodiment of the present invention, the preset percentage threshold is not limited to the above value, and may also be other percentage values, which is not specifically limited in the embodiment of the present invention.

In step S240, when the target object exists in the image frame to be recognized, a frame sequence segment including the target object in the video fragment is obtained according to a preset rule, and a position of the target object in the frame sequence segment is used as an information implantation region.

In one embodiment of the present invention, there are typically a plurality of image frames in a video slice available for information embedding, that is, there is a business opportunity or a business opportunity of grafting or no business opportunity of midlife for the plurality of image frames in the video slice, and the image frames are ordered in time to form a frame sequence segment. The duration of the frame sequence segments is also of significant concern to ensure the user reach of the advertisement, which can affect the advertiser's interest if a frame sequence segment available for advertisement placement is too short in duration and, once in the flash, has a low user reach.

In one embodiment of the present invention, the determination of the frame sequence segment varies with the type of the video segment, and when the type of the video segment is a still video segment, the image content in one video segment does not substantially change, so that the first frame containing the target object can be used as a start frame and the last frame of the video segment can be used as an end frame, and then the start frame, all frames between the start frame and the end frame, and the end frame are ordered in time to form the frame sequence segment, for example, the first frame to the last frame in one video segment contain poster bulletin board, and then all frames in the video segment can be arranged in time to form the frame sequence segment.

In one embodiment of the present invention, when the video slice type is a moving video slice, a frame sequence segment into which information can be embedded may be determined according to the amount of motion of pixels in image frames spaced at a certain distance, based on the image frame to be recognized containing the target object determined in step S230.

Fig. 9 shows a schematic flowchart of a process for acquiring a frame sequence segment, and as shown in fig. 9, the method for identifying a frame sequence segment having an information embedding region from a video of a moving video slice at least includes steps S901-S905, specifically:

in step S901, with all image frames to be recognized including the target object as a starting frame, calculating a motion amount of pixels in a preset image frame relative to pixels in the starting frame, where a frame number of the preset image frame and a frame number of the starting frame are separated by a preset value.

In an embodiment of the present invention, the amount of motion of the target object in the preset image frame relative to the pixel corresponding to the target object in the starting frame may be obtained by the same method as in fig. 5, and no further description is given here in the embodiment of the present invention. The preset value may be one or any value greater than one, that is, the preset image frame may be an image frame adjacent to the starting frame, or an image frame spaced from the starting frame by a frame number greater than one.

In step S902, when the motion amount satisfies a preset condition, the preset image frame is taken as a second target image frame.

In an embodiment of the present invention, after the motion amount of the pixel is obtained, the motion amount may be compared with a first preset motion amount threshold and a second preset motion amount threshold, and when the motion amount is greater than or equal to the second preset motion amount threshold and less than the first preset motion amount threshold, the preset image frame is the second target image frame.

In step S903, the second target image frame is used as a start frame, and the above steps are repeated to obtain all second target image frames meeting the preset condition.

In step S904, the image frames of the second target image frame including the target object are ordered according to time to form a frame sequence, and a duration of the frame sequence is obtained.

In an embodiment of the present invention, after all the second target image frames are acquired, the depth coder decoder network model identifies a business opportunity in the second target image frames to acquire a second target image frame containing the business opportunity, that is, a second target image frame containing a target object, and then arranges all the second target image frames containing the business opportunity in a time sequence to form a frame sequence, and further, the duration of the frame sequence may be determined according to the sequence number of the first frame and the sequence number of the last frame in the frame sequence, for example, the frame sequence formed by arranging the target image frames containing the business opportunity in a time sequence is: frame 1, frame 50, frame 75, then the duration of the frame sequence can be determined to be 3s based on the sequence number 1 of the first frame and the sequence number 75 of the last frame.

In step S905, when the duration is greater than or equal to a preset time threshold, the frame sequence is used as the frame sequence segment, and information to be implanted is implanted into a position where the target object in the frame sequence segment is located.

In an embodiment of the present invention, if the duration of the frame sequence is too short, the playing duration of the advertisement placement is short, and a situation that the advertisement is too flashing may occur, and for an advertiser, the reach rate and the commercial conversion rate of the advertisement to the user cannot be improved, and the benefit of the advertiser is affected, so that it is necessary to select the frame sequence that meets the duration condition, specifically, the duration of the frame sequence may be compared with a preset time threshold to determine whether the duration of the frame sequence meets the duration condition, and the preset time threshold may be set according to actual needs, for example, the preset time threshold may be set to 4s, and the embodiment of the present invention is not specifically limited to this.

In an embodiment of the present invention, if the duration of the frame sequence is greater than or equal to the preset time threshold, which indicates that the duration of the frame sequence satisfies the duration condition, the frame sequence may be used as a finally determined frame sequence segment, and the information to be implanted is implanted in the position where the target object is located in the frame sequence segment. Specifically, all frame sequences greater than or equal to the preset time threshold may be output and advertised at a grafted wood business opportunity and/or a neutral business opportunity in the image frames contained therein.

For a motion video segment, for example, according to the relationship between the motion amount of pixels in adjacent image frames in the video segment and a first preset motion amount threshold, it is determined that the image frames to be identified in the video segment are respectively: the 1 st frame, the 75 th frame, the 100 th frame, the 200 th frame and the 400 th frame are identified by the depth encoder decoder network model, and the 1 st frame, the 100 th frame and the 200 th frame have business opportunities, which indicates that the video fragment has business opportunities, then the 1 st frame, the 100 th frame and the 200 th frame can be respectively used as starting frames to obtain a target image frame with the motion amount of pixels of adjacent image frames meeting a preset condition, and a target object (business opportunity) in the target image frame is identified by the depth encoder decoder network model to obtain a target image frame containing the target object, for example, the 1 st frame is used as the starting frame, the target image frame meeting the preset condition and containing the target object is the 1 st to 74 th frames, then the frame sequence can be determined as the 1 st to 74 th frames, the frame sequence is calculated by using the 1s containing 25 frames, the time length corresponding to the frame sequence is close to 3s, if the preset time threshold is 4s, the duration of the frame sequence is less than the preset time threshold, and the frame sequence is not suitable for advertisement implantation, so the frame sequence can be ignored; if the 100 th frame is taken as the starting frame, the preset condition is met, and the target image frame containing the target object is the 76 th frame to the 199 th frame, the 76 th frame to the 199 th frame can be taken as a frame sequence, the corresponding time length of the frame sequence is close to 5s and is larger than the preset time threshold, and therefore the frame sequence can be output as a frame sequence section for advertisement implantation and is implanted with the advertisement. In addition, when the motion amount of the pixel is judged, the motion amount of the pixel between the image frames with a certain frame number can be judged, for example, if the preset value is set to be 20, the motion amounts of the pixels between the 1 st frame and the 21 st frame and between the 21 st frame and the 41 th frame are compared, so that the data processing amount is reduced, and the data processing efficiency is improved.

It should be noted that, in a frame sequence with a duration greater than or equal to the preset time threshold, there may be an interval time between partial frames that is too long, so that the effect of advertisement placement is poor, and therefore, in order to improve the user reach rate and commercial conversion rate of the advertisement, a subsequence with continuous business opportunity in the frame sequence and a duration greater than or equal to the preset time threshold may be selected as a frame sequence segment, and advertisement placement is performed.

In an embodiment of the present invention, the above embodiment describes that the server 105 executes the method for identifying the information implantation area, and similarly, the method for identifying the information implantation area may also be executed by a terminal device, where the terminal device may be the terminal device 101 shown in fig. 1, or may also be the terminal device 102 or the terminal device 103, and accordingly, a depth encoder/decoder network model is provided in the terminal device 101, and after receiving the to-be-processed video input by the user or obtained online, the terminal device 101 may perform shot slicing on the to-be-processed video to obtain a plurality of video slices; determining one or more image frames to be identified from the video fragments according to the video fragment types, and classifying pixels in the image frames to be identified; then judging whether a target object exists in the image frame to be identified according to the pixel classification result, wherein the target object is a flower-grafting business opportunity or a middle-living business opportunity which can be used for advertisement implantation; and finally, after the target object exists in the image frame to be identified, acquiring a frame sequence section containing the target object in the video fragment according to a preset rule, and taking the position of the target object in the frame sequence section as an advertisement implantation area.

In an embodiment of the present invention, for example, an advertisement of a certain brand of cosmetics is implanted into a file of beauty program, after a perfect beauty program is recorded, a video of the beauty program may be uploaded to a terminal device, the terminal device may upload the video to a server for processing, or may locally process the video, the video may be first shot cut to obtain a plurality of video slices, then determine a type of each video slice, when the type of the video slice is determined to be a still video slice, any one frame, for example, a first frame, in the video slice may be selected as an image frame to be identified, and the image frame to be identified is input to a depth encoder decoder network model to identify a graft opportunity and/or a neutral opportunity therein, if the graft opportunity and/or the neutral opportunity exist in the image frame to be identified, arranging all image frames containing the grafted wood business opportunity and/or the non-midlife business opportunity according to a time sequence to form a frame sequence section and outputting the frame sequence section, and implanting advertisements of certain brands of cosmetics into the business opportunity of the frame sequence section; when the type of the video fragment is judged to be the motion video fragment, all image frames to be identified in the video can be determined according to the motion amount between pixels in adjacent image frames, and then the flower-grafting business opportunity and/or the neutral business opportunity in all the image frames to be identified are identified through a network model of a depth encoder decoder so as to determine whether the video fragment has the business opportunity; then, for the image frames to be identified with business opportunities, all target image frames meeting preset conditions can be determined continuously according to the motion amount of pixels in adjacent image frames, and the business opportunities in the target image frames are identified through a depth encoder decoder network model so as to obtain target image frames containing the business opportunities; and finally, arranging target image frames containing business opportunities according to a time sequence, comparing the time length of the sequenced frame sequence with a preset time threshold, outputting a frame sequence section with the time length being greater than or equal to the preset time threshold, and implanting advertisements of certain brands of cosmetics into the business opportunities of the frame sequence section.

The technical scheme of the embodiment of the invention can automatically determine the frame sequence containing business opportunities by determining the image frames to be identified from the video fragments and identifying the business opportunities in the image frames to be identified through the trained depth encoder decoder network model, so that the video advertisement implantation efficiency is greatly improved; in addition, different methods can be adopted to determine the image frames to be identified aiming at different types of video fragments, and the frame sequence segments containing business opportunities are determined according to different methods, so that the continuity of the business opportunities is ensured, the business opportunities in the videos are prevented from being manually screened and marked, and the production efficiency and the yield are improved.

The following describes embodiments of the apparatus of the present invention, which can be used to perform the method for identifying an information implantation area in the above-described embodiments of the present invention. For details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the method for identifying an information implantation area of the present invention.

Fig. 10 schematically shows a block diagram of an identification apparatus of an information implantation area according to an embodiment of the present invention.

Referring to fig. 10, an apparatus 1000 for identifying an information implantation area according to an embodiment of the present invention includes: a video segmentation module 1001, a pixel classification module 1002, an object judgment module 1003 and a region determination module 1004.

The video segmentation module 1001 is configured to obtain a video to be processed, and segment the video to be processed to obtain a plurality of video segments; a pixel classification module 1002, configured to determine an image frame to be identified from each video segment, and classify pixels in the image frame to be identified, so as to obtain a pixel type in the image frame to be identified; an object judgment module 1003, configured to determine whether a target object exists in the image frame to be identified according to the pixel type; the region determining module 1004 is configured to, when the target object exists in the image frame to be identified, obtain, according to a preset rule, a frame sequence segment that includes the target object in the video segment, and implant a position of the target object in the frame sequence segment as information into a region.

In one embodiment of the present invention, the video segmentation module 1001 includes: the characteristic extraction unit is used for extracting target characteristics from the video to be processed; and the similarity comparison unit is used for carrying out similarity algorithm identification on the target characteristics in the adjacent image frames and segmenting the video to be processed according to the identification result so as to obtain the plurality of video fragments.

In an embodiment of the present invention, the similarity comparison unit is configured to: calculating the distance between target features in the adjacent image frames, and performing similarity algorithm identification according to the distance; when the distance is smaller than a preset distance threshold value, the adjacent image frames belong to the same video fragment; and when the distance is greater than or equal to the preset distance threshold, the adjacent image frames belong to different video slices.

In one embodiment of the invention, the video slices are still video slices; the pixel classification module 1002 is configured to: and taking any frame in the still video fragment as the image frame to be identified.

In one embodiment of the present invention, the video slices are motion video slices; the pixel classification module 1002 is configured to: taking a first frame in the video fragment as a starting frame, and calculating the motion amount of pixels in adjacent image frames; and if the motion amount of the pixels in the first target image frame is larger than or equal to a first preset motion amount threshold value, taking the starting frame and the first target image frame as the image frame to be identified.

In one embodiment of the present invention, the pixel classification module 1002 comprises: and the pixel classification unit is used for inputting the image frame to be identified into a depth encoder decoder network model, encoding and decoding the image frame to be identified through the depth encoder decoder network model so as to classify pixels in the image frame to be identified and acquire the pixel type in the image frame to be identified.

In one embodiment of the invention, the depth coder decoder network model comprises a cascaded encoder and decoder; the pixel classification unit is configured to: performing convolution from a lower layer to a higher layer on the image frame to be identified through the encoder to obtain a plurality of characteristics; integrating the plurality of features to form a coded feature; and upsampling the coding features through the decoder to acquire pixel classification information with the same size as the image frame to be identified, wherein the pixel classification information comprises a pixel type.

In an embodiment of the present invention, the object determining module 1003 is configured to: determining an object to be processed according to the pixel type of the pixels in the image frame to be recognized; acquiring the area ratio of the object to be processed in the image frame to be identified; comparing the area ratio with a preset ratio threshold value, and determining the target object according to a comparison result; and if the area occupation ratio corresponding to the target object to be processed is larger than the preset occupation ratio threshold, taking the target object to be processed as the target object.

In one embodiment of the present invention, the pixel types include a first pixel type, a second pixel type and a third pixel type, wherein a target object corresponding to the first pixel type is used to provide an implantation carrier for information to be implanted; the target object corresponding to the second pixel type is used for providing an implantation plane for information to be implanted; and the target object corresponding to the third pixel type is an invalid implantation area.

In one embodiment of the invention, the video slices are still video slices; the region determination module 1004 is configured to: taking a first frame containing the target object as a starting frame and taking a last frame of the video fragment as an ending frame; and sequencing all frames among the starting frame, the starting frame and the ending frame in time to form a frame sequence section, and implanting information to be implanted into the position where the target object is located in the frame sequence section.

In some embodiments of the present invention, the region determining module 1004 comprises: the motion amount calculation unit is used for calculating the motion amount of pixels in a preset image frame relative to pixels in an initial frame by taking all image frames to be identified containing the target object as the initial frame, and the frame number of the preset image frame and the frame number of the initial frame are separated by a preset value; the selection unit is used for taking the preset image frame as a second target image frame when the motion amount meets a preset condition; the acquisition unit is used for taking the second target image frame as a starting frame and repeating the steps to acquire all second target image frames meeting the preset condition; the sequencing unit is used for sequencing the image frames containing the target object in the second target image frame according to time to form a frame sequence and acquiring the duration of the frame sequence; and the frame sequence segment generating unit is used for taking the frame sequence as the frame sequence segment and implanting information to be implanted into the position where the target object in the frame sequence segment is located when the duration is greater than or equal to a preset time threshold.

In some embodiments of the invention, the selection unit is configured to: comparing the amount of motion to the first and second preset threshold amounts of motion; and when the motion amount is greater than or equal to the second preset motion amount threshold and less than the first preset motion amount threshold, the preset image frame is the target image frame.

It should be noted that the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiment of the present invention.

As shown in fig. 11, a computer system 1100 includes a Central Processing Unit (CPU)1101, which can perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for system operation are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output section 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

In particular, according to an embodiment of the present invention, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. When the computer program is executed by a Central Processing Unit (CPU)1101, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiment of the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for identifying an information-bearing region, comprising:

acquiring a video to be processed, and segmenting the video to be processed to acquire a plurality of video segments;

determining an image frame to be identified from each video fragment, and classifying pixels in the image frame to be identified to obtain the pixel type in the image frame to be identified;

determining whether a target object exists in the image frame to be identified according to the pixel type;

when the target object exists in the image frame to be identified, acquiring a frame sequence section containing the target object in the video fragment according to a preset rule, and taking the position of the target object in the frame sequence section as an information implantation area.

2. The method for identifying an information-embedded region according to claim 1, wherein the step of obtaining a video to be processed and slicing the video to be processed to obtain a plurality of video slices comprises:

extracting target features from the video to be processed;

and performing similarity algorithm identification on the target features in the adjacent image frames, and segmenting the video to be processed according to the identification result to obtain the plurality of video segments.

3. The method for identifying an information-embedded region according to claim 2, wherein performing similarity algorithm identification on the target features in adjacent image frames, and segmenting the video to be processed according to the identification result to obtain the plurality of video segments comprises:

calculating the distance between target features in the adjacent image frames, and performing similarity algorithm identification according to the distance;

when the distance is smaller than a preset distance threshold value, the adjacent image frames belong to the same video fragment;

and when the distance is greater than or equal to the preset distance threshold, the adjacent image frames belong to different video slices.

4. The method according to claim 1, wherein the video slices are still video slices;

determining an image frame to be identified from each video slice, including:

and taking any frame in the still video fragment as the image frame to be identified.

5. The method according to claim 1, wherein the video slices are motion video slices;

determining an image frame to be identified from each video slice, including:

taking a first frame in the video fragment as a starting frame, and calculating the motion amount of pixels in adjacent image frames;

and if the motion amount of the pixels in the first target image frame is larger than or equal to a first preset motion amount threshold value, taking the starting frame and the first target image frame as the image frame to be identified.

6. The method for identifying an information implantation region according to claim 4 or 5, wherein classifying the pixels in the image frame to be identified to obtain the pixel types in the image frame to be identified comprises:

inputting the image frame to be identified into a depth encoder decoder network model, and encoding and decoding the image frame to be identified through the depth encoder decoder network model so as to classify pixels in the image frame to be identified and acquire the pixel type in the image frame to be identified.

7. The method according to claim 6, wherein the depth encoder decoder network model comprises a cascaded encoder and decoder;

inputting the image frame to be recognized into a depth encoder decoder network model, and performing encoding and decoding processing on the image frame to be recognized through the depth encoder decoder network model so as to classify pixels in the image frame to be recognized and acquire pixel types in the image frame to be recognized, wherein the method comprises the following steps:

performing convolution from a lower layer to a higher layer on the image frame to be identified through the encoder to obtain a plurality of characteristics;

integrating the plurality of features to form a coded feature;

and upsampling the coding features through the decoder to acquire pixel classification information with the same size as the image frame to be identified, wherein the pixel classification information comprises a pixel type.

8. The method for identifying an information implantation region according to claim 1, wherein determining whether a target object exists in the image frame to be identified according to the pixel type comprises:

determining an object to be processed according to the pixel type of the pixels in the image frame to be recognized;

acquiring the area ratio of the object to be processed in the image frame to be identified;

comparing the area ratio with a preset ratio threshold value, and determining the target object according to a comparison result;

and if the area occupation ratio corresponding to the target object to be processed is larger than the preset occupation ratio threshold, taking the target object to be processed as the target object.

9. The method for identifying the information implantation area according to claim 8, wherein the pixel types include a first pixel type, a second pixel type and a third pixel type, wherein a target object corresponding to the first pixel type is used for providing an implantation carrier for the information to be implanted; the target object corresponding to the second pixel type is used for providing an implantation plane for information to be implanted; and the target object corresponding to the third pixel type is an invalid implantation area.

10. The method according to claim 1, wherein the video slices are still video slices;

acquiring a frame sequence segment containing the target object in the video fragment according to a preset rule, and taking the position of the target object in the frame sequence segment as an information implantation area, wherein the method comprises the following steps:

taking a first frame containing the target object as a starting frame and taking a last frame of the video fragment as an ending frame;

and sequencing all frames among the starting frame, the starting frame and the ending frame in time to form a frame sequence section, and implanting information to be implanted into the position where the target object is located in the frame sequence section.

11. The method for identifying an information embedding region according to claim 5, wherein obtaining a frame sequence segment containing the target object in the video slice according to a preset rule, and using a position of the target object in the frame sequence segment as the information embedding region comprises:

calculating the motion amount of pixels in a preset image frame relative to pixels in an initial frame by taking all image frames to be recognized containing the target object as the initial frame, wherein the frame number of the preset image frame and the frame number of the initial frame are separated by a preset numerical value;

when the motion amount meets a preset condition, taking the preset image frame as a second target image frame;

taking the second target image frame as a starting frame, and repeating the steps to obtain all second target image frames meeting the preset conditions;

sequencing the image frames containing the target object in the second target image frame according to time to form a frame sequence, and acquiring the duration of the frame sequence;

and when the duration is greater than or equal to a preset time threshold, taking the frame sequence as the frame sequence segment, and implanting information to be implanted into the position where the target object is located in the frame sequence segment.

12. The method for identifying an information-embedded region according to claim 11, wherein taking the preset image frame as a target image frame when the motion amount satisfies a preset condition comprises:

comparing the amount of motion to the first and second preset threshold amounts of motion;

and when the motion amount is greater than or equal to the second preset motion amount threshold and less than the first preset motion amount threshold, the preset image frame is the target image frame.

13. An apparatus for identifying an information-embedded region, comprising:

the video segmentation module is used for acquiring a video to be processed and segmenting the video to be processed to acquire a plurality of video segments;

the pixel classification module is used for determining an image frame to be identified from each video fragment and classifying pixels in the image frame to be identified so as to acquire the pixel type in the image frame to be identified;

the object judging module is used for determining whether a target object exists in the image frame to be identified according to the pixel type;

and the region determining module is used for acquiring a frame sequence segment containing the target object in the video fragment according to a preset rule when the target object exists in the image frame to be identified, and implanting the position of the target object in the frame sequence segment into a region as information.

14. A computer storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the method for identifying an information implantation area according to any one of claims 1 to 12.

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for identifying an information implantation area according to any one of claims 1 to 12.