CN114120170A

CN114120170A - Video picture analysis method, apparatus, device, medium, and program product

Info

Publication number: CN114120170A
Application number: CN202111222803.8A
Authority: CN
Inventors: 汪瑜; 许佳禾; 冯莹
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-03-01

Abstract

The embodiment of the application provides a video picture analysis method, a video picture analysis device, video picture analysis equipment, a video picture analysis medium and a program product, belongs to the technical field of image analysis, and aims to improve the efficiency of video picture analysis and optimize user experience, wherein the video picture analysis method comprises the following steps: acquiring a video picture in a video stream; in response to the obtained coordinate sets, creating a plurality of image layers corresponding to the coordinate sets one by one on a canvas corresponding to the video picture, wherein each coordinate set is used for drawing a frame of a region of interest; drawing and displaying a frame of the region of interest corresponding to each coordinate set on the layer corresponding to each coordinate set; and respectively analyzing the drawn interesting regions of the borders of the interesting regions in the video picture to obtain an image analysis result of the video picture.

Description

Video picture analysis method, apparatus, device, medium, and program product

Technical Field

The present application relates to the field of image analysis technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for video image analysis.

Background

In machine vision and image analysis, a region to be processed needs to be delineated in a processed image in the form of an irregular polygon such as a box, a circle, an ellipse, and the like, the process is called border drawing, and a drawn border is called a region of interest (ROI). In the field of image analysis, it is often necessary to analyze images in the ROI region as well, for example, to perform face recognition, object classification, and the like on the images in the ROI region.

In the related art, a plurality of frames generally need to be drawn in an image, for example, in the image analysis of a crosswalk of a certain street, all crosswalks on the street need to be selected, in this case, ROI regions (i.e., frames) of various shapes are generally drawn on the image by using the canvas technology, so that the frames of the ROI region can be drawn. However, in this way, a plurality of frames need to be drawn at one time, and when one of the frames needs to be modified subsequently, all the frames need to be cleared for redrawing, so that a lot of time is wasted in the step of drawing the frames, and the whole process is very inflexible, thereby resulting in low efficiency of image analysis.

Disclosure of Invention

In view of the above problems, a video picture analysis method, apparatus, device and medium according to embodiments of the present application are proposed to overcome or at least partially solve the above problems.

In order to solve the above problem, in a first aspect of the present application, there is provided a video picture analysis method, including:

acquiring a video picture in a video stream;

in response to the obtained coordinate sets, creating a plurality of image layers corresponding to the coordinate sets one by one on a canvas corresponding to the video picture, wherein each coordinate set is used for drawing a frame of a region of interest;

drawing and displaying a frame of the region of interest corresponding to each coordinate set on the layer corresponding to each coordinate set;

and respectively analyzing the drawn interesting regions of the borders of the interesting regions in the video picture to obtain an image analysis result of the video picture.

Optionally, each of the frames is bound with a configuration parameter, and the configuration parameter is obtained according to a configuration operation of a user on the configuration parameter of each of the frames; the analyzing the drawn frames of the plurality of interesting regions respectively in the interesting regions framed and selected in the video picture to obtain the image analysis result of the video picture comprises:

and respectively analyzing the regions of interest framed and selected by the plurality of frames in the video picture based on the configuration parameters bound to the plurality of frames respectively to obtain an image analysis result.

Optionally, after obtaining a plurality of coordinate sets that satisfy the bounding box drawing condition, the method further includes:

storing the plurality of coordinate sets and the plurality of borders;

after obtaining the image analysis result of the video picture, obtaining a next video picture of the video picture;

under the condition that the similarity between the picture content of the video picture and the picture content of the next video picture is greater than a preset threshold value, determining the region of interest framed in the next video picture by each of the plurality of borders according to the plurality of coordinate sets and the plurality of borders stored in advance;

and analyzing the region of interest framed and selected by each of the plurality of frames in the next video picture to obtain an image analysis result of the next video picture.

Optionally, the method further comprises:

storing the coordinate sets, the borders and the configuration parameters bound to the borders, wherein the configuration parameters are obtained according to configuration operation of a user on the configuration parameters of each border;

the analyzing the regions of interest framed and selected by the plurality of frames in the next video picture respectively to obtain the image analysis result of the next video picture comprises:

under the condition that the similarity between the picture content of the image area of the video picture and the picture content of the image area of the next video picture is larger than a preset threshold value, determining the region of interest framed in the next video picture by each of the plurality of frames according to the plurality of coordinate sets and the plurality of frames which are stored in advance;

and analyzing the region of interest framed and selected by the plurality of frames in the next video picture according to the configuration parameters bound by the plurality of frames to obtain an image analysis result.

Optionally, the method further comprises:

generating an identifier of each frame in the plurality of frames, and binding the identifier with the frame;

creating and displaying a configuration parameter editing window of each frame;

acquiring configuration parameters input by a user through the display configuration parameter editing window;

and in response to the saving operation of the user on each frame in the plurality of frames, binding the configuration parameters of each frame with the identification of the frame.

Optionally, the state of the canvas comprises at least an edit state; in response to the acquired coordinate sets, creating a plurality of image layers corresponding to the coordinate sets one to one on a canvas corresponding to the video picture, including:

and under the condition that the state of the canvas is an editing state, responding to the acquired coordinate sets, and creating a plurality of image layers which are in one-to-one correspondence with the coordinate sets on the canvas corresponding to the video picture.

Optionally, the state of the canvas comprises at least a selection state; the method further comprises the following steps:

under the condition that the state of the canvas is a selection state, determining a selected frame according to the current coordinate of input equipment on the canvas;

and setting the selected frame to be in an editable state according to the mark of the selected frame, and/or reading the configuration parameters bound by the selected frame and displaying the configuration parameters in a configuration parameter editing window, wherein the configuration parameters displayed in the configuration parameter editing window are in the editable state.

Optionally, when the state of the canvas is a selected state, determining the selected border according to the current coordinate of the input device on the canvas includes:

detecting the number of frames where the current coordinates are located under the condition that the state of the canvas is a selection state;

determining the frame where the current coordinate is located as the selected frame under the condition that the number of the frames where the current coordinate is located is one;

determining the overlapping degree between at least two frames where the current coordinate is located under the condition that the number of the frames where the current coordinate is located is at least two;

and determining one frame of the at least two frames as the selected frame according to the overlapping degree.

Optionally, determining, according to the overlapping degree, one of the at least two borders as the selected border, including:

under the condition that the overlap degree indicates that one frame is completely covered by other frames, determining the completely covered frame as a selected frame;

and under the condition that the overlap degree characterizes that at least two frames are partially overlapped with each other, determining the central coordinates of the mutual partial overlapped area, and determining the selected frame according to the respective distances between the central coordinates of the at least two frames and the central coordinates of the mutual partial overlapped area.

Optionally, the state of the canvas further comprises an empty state; the method further comprises the following steps:

and when the operation that a user inputs an operation of entering an emptying state on the canvas is detected, emptying each frame on the canvas and entering an editing state.

Optionally, the coordinate set that satisfies the border drawing condition is obtained by:

monitoring a coordinate acquisition event on the canvas;

acquiring corresponding coordinates according to the monitored coordinate acquisition event, and adding coordinates which are not added with any coordinate set in the acquired coordinates into a current coordinate set until a coordinate acquisition stop event triggered by the input equipment on the canvas is monitored;

sequentially connecting all coordinates in the current coordinate set according to the sequence of adding the coordinates into the current coordinate set to obtain a frame to be detected;

performing polygon verification on the frame to be detected;

and under the condition that the frame to be detected passes the polygon verification, taking the frame to be detected as a coordinate set of the frame to be drawn.

In a second aspect of the embodiments of the present application, there is provided a video picture analysis apparatus, including:

the image acquisition module is used for acquiring a video image in the video stream;

the image layer creating module is used for creating a plurality of image layers which correspond to the coordinate sets one by one on a canvas corresponding to the video picture in response to the obtained coordinate sets, wherein each coordinate set is used for drawing a frame of an interested area;

the frame drawing module is used for drawing and displaying a frame of the region of interest corresponding to each coordinate set on the layer corresponding to each coordinate set;

and the picture analysis module is used for respectively analyzing the drawn interesting regions of the borders of the interesting regions in the video picture to obtain an image analysis result of the video picture.

In a third aspect of the embodiments of the present application, there is provided an electronic device, including: comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing implementing the video picture analysis method as described in the embodiments of the first aspect.

In a fourth aspect of the embodiments of the present application, a computer-readable storage medium is further provided, which stores a computer program for causing a processor to execute the video picture analysis method according to the embodiments of the first aspect of the present application.

In a fifth aspect of the embodiments of the present invention, there is provided a computer program product including a computer program or computer instructions, which when executed by a processor, implement the video picture analysis method according to the first aspect.

The embodiment of the application has the following advantages:

in the embodiment of the application, a video picture in a video stream can be obtained, and a plurality of layers are created on a canvas corresponding to the video picture in response to obtaining a plurality of coordinate sets meeting a frame drawing condition; then, respectively drawing and displaying a plurality of frames on the plurality of layers, wherein the plurality of frames correspond to the plurality of coordinate sets one by one; and then, analyzing a plurality of image areas respectively corresponding to the plurality of frames in the video picture to obtain an image analysis result of the video picture.

When the frame is drawn for the video picture, the frame is drawn based on the coordinate set meeting the frame drawing condition, and different layers can be triggered to be created by different coordinate sets, so that different frames are drawn on different layers. In this way, the drawn different frames are located in different layers, so that the drawn frames are independent of each other. Therefore, when one drawn frame is edited and modified, the frame can be modified on the layer where the frame is located, the drawn frames on other layers cannot be affected, and therefore the frame does not need to be re-drawn after all the drawn frames are deleted, the user can conveniently edit and modify the drawn frames, the time length of the user for editing and modifying the frames is greatly shortened, the efficiency of video picture analysis is improved, and the user experience is optimized.

By adopting the technical scheme of the embodiment of the application, a plurality of frames can be drawn on one video picture, and different frames are positioned on different layers, so that the image analysis of a plurality of image areas of the same video picture can be realized, and the image analysis efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of an image analysis for pedestrian detection according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating the steps of a video frame analysis method according to an embodiment of the present application;

FIG. 3 is a flow chart of the steps in an implementation of the present application to obtain a set of coordinates;

FIG. 4 is a schematic diagram of a frame to be detected connected to coordinates collected on a canvas in an implementation of the present application;

FIG. 5 is a flowchart of the steps performed in an implementation of the present application to determine a user-selected bounding box;

FIG. 6 is a schematic diagram of a case where current coordinates are located in an overlapping area of a plurality of frames in an implementation of the present application;

FIG. 7 is a flowchart illustrating the steps of performing image analysis on an image region in a video frame in accordance with an embodiment of the present application;

FIG. 8 is an interface diagram of a bezel rendering interface in the practice of the present application;

fig. 9 is a block diagram of a video frame analysis apparatus according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the embodiments of the present application are described in detail and completely in the following, and it is to be understood that the described embodiments are a part of the embodiments of the present application, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image analysis, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image analysis, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

In the related art, for example, in the process of performing pedestrian detection, pedestrian recognition, etc., after a video picture is acquired, a frame needs to be drawn on the video picture to identify an image area needing to be analyzed through the drawn frame. Referring to fig. 1, an image analysis schematic diagram for performing pedestrian detection is exemplarily shown, as shown in fig. 1, an image is a video frame in a video stream captured at a certain intersection, the number of pedestrians passing through each zebra crossing needs to be detected, an image area frame of each zebra crossing needs to be selected to perform pedestrian identification on the region of interest selected by the frame, and therefore, the area of the zebra crossing is drawn on the image, for example, the frame 101 in fig. 1 is a drawn frame. If there are 5 zebra stripes in fig. 1, 5 frames are required to be drawn to select the 5 zebra stripes.

In the related art, when a certain border of the drawn borders of the multiple regions of interest is modified, all the drawn borders of the multiple regions of interest need to be cleared, and then all the borders are redrawn. For example, as shown in fig. 1, if the border 101 is to be modified, all 5 borders need to be cleared, and then 5 borders need to be redrawn, so as to modify the border 101. Therefore, the modification is time-consuming and labor-consuming, the frame drawing efficiency is low, and the image analysis efficiency is affected.

In view of this, in order to avoid the problem that all drawn frames need to be cleared and then redrawn when the drawn frames are modified in the image analysis, the following technical ideas are proposed in the present application: when a video picture is obtained through single-pass, aiming at a plurality of coordinate sets meeting a frame drawing condition, a layer corresponding to each coordinate set is created, different coordinate sets correspond to different layers, then, frames corresponding to the coordinate sets are drawn on the corresponding layers, and when image analysis is subsequently carried out, image analysis can be carried out aiming at image areas marked by each frame. Because different drawn frames can be positioned on different layers, the drawn frames are mutually independent, so that a single frame can be selected and modified, when one drawn frame is edited, all the drawn frames do not need to be emptied, the frame editing and modifying efficiency is improved, the time for a user to edit and modify the frame is greatly shortened, the image analysis efficiency is improved, and the user experience is optimized.

Referring to fig. 2, a flowchart illustrating steps of a video picture analysis method according to an embodiment of the present application is shown, and as shown in fig. 2, the method according to the embodiment of the present application may be applied to a client, where the client is an application program running on a terminal device, the client may provide a frame drawing interface for a user, and the user may complete frame drawing of a video picture in the frame drawing interface. The method specifically comprises the following steps:

step S201: a video picture in a video stream is acquired.

In this embodiment, the video stream may be a video stream captured for a certain target scene, for example, a certain street, a certain intersection, or a certain room. The video picture may be the first key frame in the video stream. In general, an image area of a video picture may include a plurality of objects, and the plurality of objects may refer to: objects, animals, specific scenes, etc. As shown in fig. 1, the object may be a zebra crossing.

When video stream shooting is carried out on a target scene, the video stream can be shot through a ball machine, the ball machine is generally located at a fixed machine position, the ball machine refers to a camera capable of rotating within an angle range, and in practice, the ball machine can shoot the target scene under a specified posture, so that the video stream is obtained.

Step S202: and in response to the acquired coordinate sets, creating a plurality of image layers corresponding to the coordinate sets one by one on a canvas corresponding to the video picture, wherein each coordinate set is used for drawing a frame of a region of interest.

In this embodiment, a frame may refer to a figure formed by a plurality of line segments connected end to end. The canvas may be understood as a drawing interface for drawing a frame in a frame drawing interface, and the drawing of the frame generally occurs on the canvas, where the canvas may be a canvas carrying position coordinate information, and specifically, the canvas may refer to a canvas having the same size as a video picture, so that a subsequently drawn frame may be matched with image areas of a plurality of objects in the video picture.

The drawn frame can be used for selecting an image area to be identified in the video picture, so that the frame drawing can be performed according to the shape of the image area by a user conveniently, the video picture can be further displayed in the canvas, as shown in fig. 1, the area 102 is the area where the canvas is located, and the intersection image can be displayed in the canvas area, so that the user can draw the frame with the corresponding shape by taking the image as a reference when drawing.

The coordinate set may include a plurality of coordinates, and each coordinate may refer to a coordinate of a position on the canvas where a preset operation is performed, for example, may be a coordinate of a position on the canvas where a click operation is performed. Therefore, the multiple coordinates in the coordinate set can be understood as coordinates of multiple click positions clicked successively on the canvas, and each coordinate can be understood as an endpoint of one border, so that the endpoints can be connected in sequence during subsequent drawing, and the border with the corresponding shape can be obtained. Thus, a coordinate set may include a plurality of endpoints of a bounding box to be rendered.

One coordinate set may refer to a coordinate set that satisfies a border drawing condition. Meeting the border drawing condition may be understood as that a border drawing trigger event occurs in the border drawing interface, where the border drawing trigger event may be a specific operation event of the user, for example, a double-click mouse event, or a user clicks a specific control in the interface, such as a border generation control; it may also mean that a specific attribute of the coordinate set satisfies a preset condition, for example, the number of coordinates in the coordinate set reaches a preset number, or each coordinate in the coordinate set satisfies a certain requirement of the shape of the frame.

The plurality of coordinate sets satisfying the frame drawing condition may refer to that the plurality of coordinate sets respectively satisfy the frame drawing condition.

For example, taking a frame drawing trigger event as a double-click mouse event of the user as an example, after the user clicks an endpoint of the frame, the client needs to automatically draw and display the frame, at this time, the user can double-click the mouse, and then the double-click mouse event can be understood as meeting the frame drawing condition.

In practice, one coordinate set corresponds to one frame, so that different coordinate sets may correspond to different frames, for example, as shown in fig. 1, if a frame corresponding to 5 zebra stripes needs to be drawn, there are 5 corresponding coordinate sets, and different coordinate sets correspond to different frames of the zebra stripes.

After a coordinate set meeting the frame drawing condition is obtained, the coordinate set can be used as an independent coordinate set, then when the fact that preset operation is executed on the canvas is detected, the fact that a user needs to draw a new frame is represented, a new coordinate set is created, coordinates calibrated by a subsequent user are added into the new coordinate set until the frame drawing condition is met, and therefore the new coordinate set of the frame is obtained.

For example, after obtaining the coordinate set of the frame 101, if the user clicks the left mouse button at a certain position of the 103 region, a new coordinate set is created, and the coordinates of the clicked position are added to the new coordinate set, and assuming that the user clicks other 3 positions of the 103 region again, the coordinates of the other 3 positions are added to the new coordinate set, and then, the user double-clicks the mouse, so that the frame can be drawn according to the new coordinate set.

The pointed click may be a click triggered by a left or right mouse button, or may be a touch type click in the case that the terminal device has a touch screen.

Step S203: and drawing and displaying a frame of the region of interest corresponding to each coordinate set on the layer corresponding to each coordinate set.

In this embodiment, after each coordinate set meeting the frame drawing condition is obtained, a layer corresponding to the coordinate set may be created, so that the drawing of different frames occurs in different layers. The layer can be understood as an image layer bearing a frame, the frames on different layers are independent from each other, for example, the frame in layer 1 and the frame in layer 2 can be independent, and both can be edited and modified separately. In practice, when a plurality of layers are overlapped in a certain sequence, a final display page can be formed.

In this embodiment, different layers may be created for different coordinate sets, and each time a coordinate set is obtained, one layer is newly created, so that the number of layers may be the same as the number of coordinate sets. As shown in fig. 1, frames corresponding to 5 zebra stripes need to be drawn, and 5 coordinate sets are obtained, so that 5 corresponding layers are created.

After a layer is created, a frame of a coordinate set corresponding to the layer can be drawn on the layer. In specific implementation, positions of a plurality of coordinates in a coordinate set on the canvas can be determined on the layer, so that a plurality of positions are obtained, and then the positions are connected in sequence, specifically, two adjacent positions can be connected in sequence, so that a frame is drawn; or the positions can be connected in sequence according to the time sequence of the coordinate clicking to draw the frame. In practice, a plurality of positions may be connected according to actual conditions.

After the frame corresponding to the coordinate set is drawn on the layer, the layer with the frame can be understood as a complete image, because different layers are created for different coordinate sets, and after the frame is drawn on different layers, a plurality of mutually independent images are obtained.

For example, a frame of a corresponding zebra crossing is drawn on each layer, so that 5 layers are obtained, and corresponding frames are drawn on the 5 layers, so that the respective frames corresponding to the 5 zebra crossings are independent from each other, and when one frame is redrawn, only the frame of the frame occurs, and then other frames are not affected.

Specifically, after the frame corresponding to the coordinate set is drawn on the layer, the layer with the frame may be understood as a complete image, in practice, when multiple frames need to be drawn, if the multiple frames are created on the same layer, the multiple frames may be understood as being on the same complete image (the multiple frames cannot be divided separately), one of the frames cannot be modified separately, and when one of the frames is redrawn, the entire complete image needs to be deleted, so that the modification can be redrawn. And this application can be drawn a plurality of frames respectively on a plurality of layers, so, when redrawing one of them frame, just need delete this frame on the layer of this frame, and redraw can, and the frame that is located other layers then can not receive the influence, so, greatly reduced in the condition that needs draw a plurality of frames, the user is long when editing the modification to the frame, user experience has been optimized, frame editing, the efficiency of modification has also been improved, image analysis efficiency has then been improved again.

Step S204: and respectively analyzing the drawn interesting regions of the borders of the interesting regions in the video picture to obtain an image analysis result of the video picture.

In this embodiment, the drawn frame is drawn on the canvas corresponding to the video picture, and is used to frame out the region of interest in the video picture, in practice, the frame of the plurality of regions of interest drawn for the video picture may be saved, specifically, the positions and shapes of the plurality of frames in the video picture may be saved to indicate the framed region of interest, and further, the image region that needs to be analyzed in the video picture may be determined through the frame.

For example, as shown in fig. 1, after the frame of the zebra crossing is drawn, the position and the shape of the drawn frame in the video picture may be saved, and when performing image analysis, the position where the frame is drawn and the region of interest selected by the frame of the drawn frame shape may be determined in the video picture, so as to analyze the picture in the image region.

Because a plurality of frames are drawn, regions of interest respectively corresponding to the frames are present in the video picture, and during analysis, image analysis of the same analysis task can be performed respectively for each region of interest selected by the frames, for example, pedestrian recognition is performed for each of the regions of interest.

Of course, in one embodiment, different tasks of image analysis may be performed for selected regions of interest framed by different frames, for example, pedestrian detection may be performed for selected regions of interest framed by one frame, and lane occupancy detection may be performed for selected regions of interest framed by another frame. In this case, the region of interest framed by each frame may be segmented from the video frame, and then sent to an image analysis unit (for example, an image analysis model) corresponding to the region of interest, so as to obtain an image analysis result of the region of interest.

Of course, in other cases, there may be overlapping regions between the regions of interest corresponding to different frames, so that when each region of interest is divided, the regions of interest may be divided in the original video frame, so that each of the divided regions of interest is a complete region.

In yet another embodiment, the selected regions of interest framed by a portion of the frames may be subjected to image analysis for the same task, while the selected regions of interest framed by different frames may be subjected to image analysis for different tasks.

In order to realize image analysis of two or more different analysis tasks, task identifiers can be bound to each frame after the drawing of each frame is completed, and then corresponding image analysis can be performed on the region of interest selected by the frame based on the bound task identifiers during image analysis.

By adopting the technical scheme of the embodiment of the application, when the frame is drawn for the video picture, the frame is drawn based on the coordinate set meeting the frame drawing condition, and different coordinate sets can trigger different layers to be created, so that when different frames are drawn, the frames are drawn on different layers. In this way, the drawn different frames are located in different layers, so that the drawn frames are independent of each other. Therefore, when one drawn frame is edited and modified, the frame can be modified on the layer where the frame is located, the drawn frames on other layers cannot be affected, and therefore the frame does not need to be re-drawn after all the drawn frames are deleted, the user can conveniently edit and modify the drawn frames, the time length of the user for editing and modifying the frames is greatly shortened, the efficiency of video picture analysis is improved, and the user experience is optimized.

In practice, the borders of the multiple regions of interest drawn for the video pictures can be multiplexed with other video pictures in the video stream, because the video stream is shot for the target scene and is based on the fixed pose (the rotation angle and the rotation direction of the ball machine), so that the difference between the video pictures in the shot video stream is small, and the multiplexing of the borders of the drawn multiple regions of interest can be realized.

Accordingly, in one embodiment, the plurality of coordinate sets may be stored after the plurality of coordinate sets are obtained. In particular, multiple sets of coordinates may be stored locally to enable multiplexing of the next video picture.

Then after obtaining the image analysis result of the video picture, a next video picture of the video picture may be obtained from the video stream, and the next video picture may refer to a next key frame in the video stream.

Under the condition that the similarity between the picture content of the video picture and the picture content of the next video picture is greater than a preset threshold, creating a plurality of picture layers on a canvas corresponding to the next video picture according to a plurality of pre-stored coordinate sets meeting frame drawing conditions, and drawing and displaying a plurality of frames on the plurality of picture layers respectively, wherein the plurality of frames correspond to the plurality of coordinate sets one by one;

and then, respectively analyzing a plurality of interested areas respectively corresponding to the plurality of frames in the next video picture to obtain an image analysis result of the next video picture.

In one embodiment, determining similarity between picture contents of video pictures may refer to performing keypoint matching on a video picture and a next video picture to obtain a result of keypoint matching, where the result may include the number of keypoints on matching and a difference in positions of the keypoints on matching. The similarity may be determined according to the result of the keypoint matching, and specifically, the greater the number of keypoints on the matching, and the smaller the difference in position of the keypoints on the matching, the higher the similarity, and vice versa, the lower the similarity.

The preset threshold value can be set according to requirements, and under the condition that the similarity is greater than the preset threshold value, the difference between the representation video picture and the next video picture is small, so that the multiplexing of the frame can be realized. In this way, according to a plurality of coordinate sets which are stored in advance and meet the frame drawing condition, a plurality of layers can be created on the canvas corresponding to the next video picture, and a plurality of frames can be drawn and displayed on the plurality of layers respectively, and of course, the drawn frames of a plurality of regions of interest are in one-to-one correspondence with the plurality of coordinate sets.

Therefore, the interesting region in the video picture can be subjected to image analysis based on the drawn borders of the plurality of interesting regions.

In another embodiment, the obtained next video frame may be a frame whose similarity with the video frame is greater than a preset threshold, that is, in the obtaining of the next video frame, the obtained video frame is a video frame whose difference with the video frame is small, and then a plurality of frames may be directly drawn on the next video frame.

In the above embodiment, the multiplexing of the frame is realized by multiplexing the coordinate set. In yet another possible implementation manner, multiplexing of the drawn frames may be directly implemented, specifically, in step S203, when the plurality of frames are drawn and displayed on the plurality of layers respectively, the plurality of frames and the corresponding plurality of coordinate sets may also be saved, specifically, when the plurality of frames and the plurality of coordinate sets are saved, positions and shapes of the plurality of frames in the video picture may be saved, so that when a next video picture with a higher similarity is obtained, the positions and shapes of the frames in the video picture may be determined directly according to the saved frames and the corresponding coordinate sets, so that the regions of interest framed by the respective plurality of frames in the next video picture are analyzed respectively, and an image analysis result of the next video picture is obtained.

When the method is adopted, the step of drawing the frame is not needed, and the region of interest framed and selected by the frame is directly determined from the video picture according to the saved frame and the coordinate set of the frame, so that the efficiency of subsequent video picture analysis is improved.

How the image analysis is performed will be described accordingly.

After the frames corresponding to the coordinate set are drawn and displayed on the map layer, configuration parameters may be bound to each frame, where the bound configuration parameters may be parameters on which an area of interest selected by the frame is based when performing an image analysis task, and may be actually used to describe features of the framed area of interest. For example, the configuration parameters may refer to configuration parameters such as a region identifier, a region area, and an identification task of a region of interest framed by the drawn border. Specifically, the process of setting configuration parameters for each drawn frame may be: and respectively binding configuration parameters for the plurality of frames according to configuration parameter configuration operation of the plurality of frames by a user.

The configuration parameter configuration interface may be provided for a user, so that the user performs configuration parameter configuration operation in the configuration parameter configuration interface, and further may obtain configuration parameters input by the user for performing configuration parameter configuration operation, and the specific process is described in detail in the following description.

After the configuration parameters bound by each frame are obtained, the regions of interest framed and selected by the frames in the video picture can be analyzed respectively based on the configuration parameters bound by the frames, so as to obtain an image analysis result.

The configuration parameters bound to each frame can be used for indicating the attribute of the framed region of interest and the image analysis task, and the attribute can refer to the real geographic identification of the image region, so that when the image analysis task is obtained, the image analysis result of the region of interest can be generated according to the bound configuration parameters. The image analysis result may then include the properties of the region of interest indicated by the configuration parameters. For example, as shown in fig. 1, an image analysis result of a region of interest 101 is obtained, the configuration parameter bound to the region of interest 101 is a middle crosswalk, and the image analysis result is a pedestrian 10, and based on the image analysis result and the bound configuration parameter, the output image analysis result may be that the pedestrian traffic of the middle crosswalk is 10.

Of course, as described in the above embodiment, in the case that the drawn frame can be multiplexed with other video pictures in the video stream, the configuration parameters bound for the frame can also be multiplexed with other video pictures in the video stream. Correspondingly, when the next video picture is obtained and a plurality of interested areas of the next video picture need to be analyzed respectively, the image area of the frame of the next video picture can be analyzed according to the configuration parameters bound by the frame under the condition that the similarity between the image content of the image area of the video picture and the image content of the image area of the next video picture is greater than the preset threshold value, so as to obtain an image analysis result.

As described above, because the region of interest selected by the different frames can perform image analysis of different tasks, when each frame is bound with configuration parameters, the task identifier of the image analysis can be used as a part of the configuration parameters.

The configuration parameter configuration of each frame may be as described in the following process of the embodiment:

firstly, generating an identifier of each frame in the plurality of frames, and binding the identifier with the frames; then, creating and displaying a configuration parameter editing window of each frame; correspondingly, when the configuration parameters are respectively bound for the plurality of frames according to the configuration parameter configuration operation of the plurality of frames by the user, the configuration parameters input by the user through the display configuration parameter editing window can be obtained; and binding the configuration parameters of each frame with the frame according to the saving operation of the user on each frame in the plurality of frames.

In this embodiment, the identifier of the frame may refer to an ID number or a text identifier, and is used to uniquely identify the frame, in practice, a frame set may be created for the video picture, and the frame set may include the identifier of the drawn frame. In specific implementation, when the frame is drawn, a random identifier can be generated for the drawn frame, then the random identifier is added into the frame set, and a path between the random identifier and the drawn frame is recorded, so as to complete the binding of the identifier and the frame.

After the frame is drawn, a configuration parameter editing window can be popped up, and configuration parameters input by a user through the configuration parameter editing window are obtained; then, when it is detected that the border is saved in the configuration parameter editing window by the user, the input configuration parameters are bound with the border, specifically, the input configuration parameters may be stored in the border set, and a binding relationship between the input configuration parameters and the identification of the border is established.

In practice, the configuration parameter editing window may pop up after each frame is drawn, or the configuration parameter editing window for each frame may pop up after all frames are drawn.

In an embodiment, since each frame is bound with configuration parameters, the configuration parameters bound by different frames are independent from each other, and in practice, the configuration parameters bound by the frames can be modified, specifically, a configuration parameter editing interface can be provided, the configuration parameter editing interface is displayed in response to a selected operation on a certain frame, the bound configuration parameters are displayed in the configuration parameter editing interface, and the bound configuration parameters are in an editable state, so that the bound configuration parameters can be modified, the modified configuration parameters are stored in response to a modification completion operation on the configuration parameter editing interface, and the modified configuration parameters are bound with the frames.

By adopting the embodiment, the configuration parameters of the frame can be bound for each frame so as to configure the configuration parameters required by the region of interest framed and selected by each frame, and when a certain frame is selected by a user, a configuration parameter editing window can be popped up so that the user can modify the configured configuration parameters conveniently, thereby optimizing the user experience.

Next, how to acquire a coordinate set that satisfies a bounding box drawing condition and how to perform bounding box drawing will be described.

In an embodiment, referring to fig. 3, fig. 3 is a flowchart illustrating a step of obtaining a coordinate set that satisfies a border drawing condition, and as shown in fig. 3, the method may specifically include the following steps:

step S301: and monitoring a coordinate acquisition event on the canvas.

In this embodiment, a coordinate acquisition event triggered by an input device on a canvas may be monitored, where the input device may be a device such as a mouse or a touch screen, and the coordinate acquisition event may be a click operation or a touch operation performed by the input device on the canvas. For example, when the input device is a mouse, a location on the canvas may be considered to trigger a coordinate acquisition event when the user operates the mouse to click on the location.

Step S302: and acquiring corresponding coordinates according to the monitored coordinate acquisition event, and adding the coordinates which are not added with any coordinate set in the acquired coordinates into the current coordinate set until a coordinate acquisition stop event triggered by the input equipment on the canvas is monitored.

In this embodiment, the canvas may carry coordinate information, the coordinate information may be created for the size of the canvas, and may include a coordinate of each position on the canvas, when the client monitors that the coordinate acquisition event is triggered on the canvas, the coordinate of the position of the coordinate acquisition event triggered on the canvas may be acquired, and then, for each monitored coordinate acquisition event, the corresponding coordinate may be acquired, and the acquired coordinate is added to the current coordinate set, and after the coordinate acquisition stop event is monitored, the current coordinate set is used as the coordinate set to be drawn by the bezel.

As described in the foregoing embodiment, different coordinate sets are for different frames, so that, after a coordinate acquisition stop event is monitored, a layer corresponding to a current coordinate set may be created, and then, a corresponding frame is drawn on the layer. When the coordinate acquisition event occurring on the canvas is monitored, a new coordinate set can be established, the acquisition of the coordinates of a new image is started until the coordinate acquisition stop event is met, and thus the coordinate sets corresponding to different frames can be obtained.

As described in the above embodiments, the coordinate capture stop event may refer to a double-click event of a mouse, or a user clicking a specific control in an interface, such as a border drawing control.

Step S303: and connecting the coordinates in the current coordinate set in sequence according to the sequence of adding the coordinates into the current coordinate set to obtain the frame to be detected.

In this embodiment, after the coordinate collection stop event is detected, the frame formed by each coordinate in the coordinate set may be subjected to frame verification to verify whether the frame formed by the coordinates in the coordinate set is reasonable, and when the frame formed by the coordinates in the coordinate set is reasonable, the current coordinate set is determined as the coordinate set meeting the frame drawing condition. That is, the satisfaction of the bounding box drawing condition in the present embodiment means that the coordinate capture stop event is satisfied, and it is reasonable that the bounding box is formed by the coordinates in the coordinate set.

The sequence of adding the coordinates into the current coordinate set may be: and when the specific implementation is carried out, the coordinates can be sequentially connected according to the order in which the coordinates are clicked, wherein the coordinate clicked for the first time and the coordinate clicked for the last time are also connected, so that the frame to be detected is obtained.

Exemplarily, referring to fig. 4, a schematic diagram of a to-be-detected frame connected to coordinates collected on a canvas is shown, as shown in fig. 4, 5 coordinates are collected, and the sequence of clicking the 5 coordinates is as follows: x1, x2, x3, x4 and x5 are connected in sequence according to the click sequence, wherein x1 is connected with x2, x2 is connected with x3, x3 is connected with x4, and x4 is connected with x5 to form a frame as shown in FIG. 4.

Step S304: and carrying out polygon verification on the frame to be detected.

In this embodiment, the polygon verification may be performed on the image to be detected, where the polygon verification may refer to: and verifying whether the frame to be detected is a polygon, specifically, the polygon refers to a closed frame formed by sequentially connecting three or more line segments which are not on the same straight line end to end and do not intersect.

The detection method comprises the steps of detecting whether the frame to be detected comprises the closed area or not, detecting the number of the closed areas which the frame to be detected comprises, and when the number of the closed areas which the frame to be detected comprises the closed area and is 1, determining that the polygon verification of the frame to be detected is passed.

As shown in fig. 4, three frames to be detected, namely, frames 4-1, 4-2 and 4-3, are provided, and all the three frames to be detected are frames obtained by sequentially connecting frames in the order of x1, x2, x3, x4 and x5, wherein fig. 4-2 and 4-1 are verified as polygons, and 4-3 is not a polygon.

It can also be known from fig. 4 that, when the sequence of the user's click coordinates is not consistent, the frames verified as polygons are different, such as the frames 4-1 and 4-2, although both frames are polygons, the shapes of the two frames are not consistent, in this case, the frames verified as polygons may be pre-displayed, for example, a preview of the frame to be detected is displayed, so that the user can determine whether the pre-drawn frame is a frame that the user desires to draw according to the preview, and if not, the user can edit again in the canvas, that is, the sequence of the click coordinates is modified.

Step S305: and under the condition that the frame to be detected passes the polygon verification, taking the frame to be detected as a coordinate set meeting a frame drawing condition.

In this embodiment, when the frame to be detected is a polygon, it is reasonable to characterize the frame to be detected, that is, the frame may frame an image area, so that the coordinate set may be used as a set that satisfies a frame drawing condition.

Of course, in some embodiments, if the polygon verification fails, the bounding box to be detected is represented as irregular, and if the bounding box is drawn, the region of interest selected by the bounding box is not reasonable, for example, as in the 4-3 bounding box in fig. 4, only a part of the region of the image to be identified is selected by the bounding box, which is not reasonable enough. In practice, a prompt message may be output to prompt the user to click the coordinates again in the correct order.

By adopting the embodiment, the polygon verification can be carried out on the frame to be detected formed by each coordinate in the coordinate set, and when the verification is passed, the coordinate set is determined as the set meeting the frame drawing condition, so that the frame drawn subsequently is a correct and reasonable frame, and the frame drawing accuracy is improved.

In an implementation manner during specific implementation, two states, namely an editing state and a selection state, may also be set for the canvas, and when the canvas is in the editing state, a representation that a new frame needs to be drawn is obtained, and thus a coordinate set needs to be obtained, and a new layer is created to draw the new frame, and when the canvas is in the selection state, a representation that a modification needs to be performed on the drawn frame is obtained, and thus a frame selected by a user needs to be determined, and a window is provided for the user to perform a modification of a frame shape or a modification of configuration parameters on the drawn frame.

Specifically, when the state of the canvas is the editing state, a coordinate set meeting a frame drawing condition may be obtained according to the processes of the above step S301 to step S305, and a plurality of layers corresponding to the plurality of coordinate sets one to one may be created on the canvas corresponding to the video picture in response to the obtained plurality of coordinate sets according to the processes of the above step S202 to step S203.

Under the condition that the state of the canvas is a selection state, the selected frame can be determined according to the current coordinate of input equipment on the canvas; and then, setting the selected frame to be in an editable state according to the mark of the selected frame, and/or reading the configuration parameters bound with the selected frame and displaying the configuration parameters in a configuration parameter editing window, wherein the configuration parameters displayed in the configuration parameter editing window are in the editable state.

In this embodiment, a drawn frame on the selection state representation canvas may be selected and edited, wherein a coordinate to which a selection operation triggered by the input device on the canvas is directed may be monitored, the coordinate is the current coordinate, and the triggered selection operation may be a click operation of a mouse, which is specifically determined according to an actual situation. The selected frame can be determined according to the position relation between the current coordinate and the drawn frame.

For example, when the current coordinate is located in one drawn frame, the frame in which the current coordinate is located may be determined as the selected frame, and when the current coordinate is located in an overlapping area of a plurality of frames, a frame closer to the current coordinate may be determined as the selected frame, for example, a distance from a geometric center of each of the plurality of frames to the current coordinate may be determined, and a frame closest to the current coordinate may be determined as the selected frame.

The embodiment of the present application provides the following implementation manner for the case that the current coordinate is located in multiple frames:

specifically, a user draws a plurality of frames on the canvas, so that there may be a portion of the frames that may have overlapping portions, or there may be a portion where none of the frames overlap, and in this case, when determining the frame selected by the user, the frame selected by the user may be determined according to whether the current coordinates when the user selects the frame are in the overlapping portion of the frame.

Specifically, referring to fig. 5, a flowchart illustrating a step of determining a frame selected by a user is shown, and as shown in fig. 5, the method may specifically include the following steps:

step S501: and under the condition that the state of the canvas is a selection state, detecting the number of frames where the current coordinates are located.

In this embodiment, the number of the frames where the current coordinate is located may represent whether the current coordinate is located in an overlapping area of a plurality of frames, where if the number of the frames is 1, the current coordinate is represented to be located on only one frame, and if the number of the frames is 2, the current coordinate is represented to be located on 2 frames at the same time.

Step S502: and determining the frame where the current coordinate is located as the selected frame under the condition that the number of the frames where the current coordinate is located is one.

As shown above, this situation indicates that the current coordinate is located on a frame, and the frame where the current coordinate is located may be directly determined as the selected frame.

Step S503: and determining the overlapping degree between at least two frames where the current coordinate is located under the condition that the number of the frames where the current coordinate is located is at least two.

And under the condition that the number of the frames is at least two, representing that the current coordinate is positioned on the frames at the same time, and determining one frame from the frames as the selected frame according to the overlapping degree of the frames on which the current coordinate is positioned. The overlapping degree may reflect the degree of overlapping of the frames, and may reflect the ratio of the intersected area to each frame.

Step S504: and determining one frame of the at least two frames as the selected frame according to the overlapping degree.

In this embodiment, the overlapping degree can be divided into the following two cases: in practice, one frame can be determined from at least two frames as the selected frame according to different overlapping conditions.

Of course, in practical implementation, in a case where the overlap degree indicates that one frame is completely covered by other frames, the completely covered frame may be determined as the selected frame. For example, if the frame 1 is completely covered by the frame 2, the frame 1 is determined as the selected frame.

In the case that the overlap degree characterizes that at least two frames are partially overlapped with each other, the center coordinates of the mutually partially overlapped region may be determined, and the selected frame may be determined according to respective distances between the center coordinates of the at least two frames and the center coordinates of the mutually partially overlapped region.

The overlapping degree represents the condition that at least two frames are partially overlapped with each other, which can be understood as follows: a partial area of one bezel is covered by a partial area of the other bezel. In this case, the central coordinates of the overlapping area and the respective central coordinates of each of the intersecting borders may be determined, wherein the central coordinates may be understood as the geometric central coordinates of the borders, and in practice, the distance between the central coordinates of each of the borders and the central coordinates of the overlapping area may be determined, and further, the closest border may be determined as the selected border.

Referring to fig. 6, a schematic diagram of overlapping regions where the current coordinates are located in multiple frames is shown, and as shown in fig. 6, there are 4 frames that are drawn, where the frame 601, the frame 602, and the frame 603 overlap each other, and are all partial region overlapping, and the overlapping portion is a shaded portion in the figure. The center coordinate of the overlapped area is y0, and the center coordinates of the frame 601, the frame 602 and the frame 603 are y1, y2 and y3, respectively, wherein the distance from y1 to y0 is the nearest, so that the frame 601 can be determined as the selected frame.

When the embodiment is adopted, because the user can select the drawn frames to edit and modify under the condition that the canvas is in a selection state, the frame where the current coordinate triggered by the user to select operation is located can be determined, when the frame where the frame is located is one, the frame is determined to be the selected frame, and when the frame where the frame is located is multiple, one frame is screened out to serve as the selected frame according to the overlapping condition of the overlapping areas between the frames. Therefore, when the drawn frames have mutually overlapped areas, the area desired to be selected by the user can be intelligently judged according to the overlapped areas between the frames, and the user experience is optimized.

After the selected frame is determined, services such as modification and editing of the selected frame can be provided for the user. Specifically, one case is: the selected frame can be set to be in an editable state according to the identifier of the selected frame, or the configuration parameters bound with the selected frame can be read and displayed in a configuration parameter editing window, wherein the configuration parameters displayed in the configuration parameter editing window are in the editable state.

The other situation is that: the selected border may be set to an editable state, and the configuration parameters bound to the selected border may be read and displayed in the configuration parameter editing window.

Wherein, setting the selected frame to be in an editable state can be understood as: the picture layer where the selected frame is located can be called, so that a user can edit the selected frame conveniently in the picture layer, the canvas can be set to be in an editing state when the frame is drawn during specific implementation, and then a coordinate collection event, a coordinate collection stop event and the like which are carried out on the canvas are monitored. In practice, when an event of ending modification is monitored, for example, when an event of stopping coordinate collection is monitored, a new border may be drawn according to the newly collected coordinates, so as to modify the selected border.

As described above, the configuration parameter editing window may be understood as a window for inputting configuration parameters for a drawn frame, where the input configuration parameters and the drawn frame are in a one-to-one binding relationship. In practice, the process of acquiring the configuration parameters bound to the frame is described in the above embodiment.

In this embodiment, the configuration parameters bound to the selected border may be displayed in the configuration parameter editing window, and the configuration parameters displayed in the configuration parameter editing window may be set to an editable state, so that the configuration parameters in the configuration parameter editing window may be modified.

In the following, a brief description is given to the overall process of the video picture analysis method according to the embodiment of the present application, which may specifically include the following processes:

first, a video picture in a video stream is acquired.

As described in the foregoing embodiment, a frame drawing task may be newly created for a current image analysis task, then, when a client obtains a video picture, a canvas corresponding to the video picture may be created, the video picture is displayed in the canvas, and the canvas is in an editing state, so that a user may draw a corresponding frame for the video picture in the canvas, for example, as shown in fig. 1, the video picture is an intersection image, and the intersection image may be displayed in the canvas, so that when drawing the frame, the user may trace a zebra crossing in the intersection image on the canvas to draw the frame.

Wherein the size of the video picture can be determined and a canvas adapted to the size of the video picture can be created, wherein the size of the created canvas can coincide with the size of the image to be processed, or the size of the created canvas can be scaled down or enlarged to the size of the video picture.

Then, in response to obtaining a plurality of coordinate sets meeting a border drawing condition, a plurality of layers are created on a canvas corresponding to the video picture.

The method comprises the steps that a canvas is in an editing state, a coordinate acquisition event triggered by input equipment on the canvas can be monitored, so that a plurality of coordinate sets are obtained based on the acquisition event, and a plurality of frames are drawn. And binding corresponding configuration parameters for each frame.

And then, respectively drawing and displaying a plurality of frames on the plurality of layers, wherein the plurality of frames correspond to the plurality of coordinate sets one by one, so that a user can check the reasonability of the drawn frames, and the drawn frames can be modified conveniently.

And finally, analyzing a plurality of image areas respectively corresponding to the plurality of frames in the video picture to obtain an image analysis result of the video picture.

Referring to fig. 7, a flowchart illustrating the steps of performing image analysis on a region of interest in a video frame is shown, and as shown in fig. 7, the method includes the following steps:

step S701: and adjusting the coordinates of each frame on the canvas according to the proportional relation between the size of the canvas and the size of the video picture.

In this embodiment, since the frame is drawn on the canvas, the frame is drawn according to the coordinate system of the canvas, as described above, the size of the created canvas may be the same as the size of the image to be processed, or the size of the created canvas may be the size of the video picture scaled down or enlarged. Then, under the condition that the size of the created canvas is the size obtained by scaling down or enlarging the size of the video picture, the coordinates of each drawn frame can be adjusted according to the proportional relationship between the size of the canvas and the size of the video picture, wherein the adjustment of the coordinates of the drawn frames can mean that the multiplication/division operation is performed on the coordinates of the drawn frames and the proportional relationship, so as to adjust the size of the drawn frames, and the size of the drawn frames is matched with the region of interest selected by the frame required to be drawn on the video picture.

Illustratively, the size of the canvas is a size obtained by compressing the size of the video picture, for example, if the compression ratio is 0.5, then the coordinates of each frame are correspondingly enlarged by 2 times to obtain the adjusted coordinates of the frame.

The coordinates of each frame are adjusted according to the proportional relation between the size of the canvas and the size of the video picture, so that the coordinates of the adjusted frame are determined according to the size of the video picture, and thus, the coordinates of each point of the adjusted frame can be regarded as the position of the point on the video picture, and the region of interest framed and selected by the adjusted frame on the video picture can be determined according to the coordinates of each point of the adjusted frame.

Step S702: and submitting the adjusted coordinates and the bound configuration parameters of each frame on the canvas to a task processing platform corresponding to an image analysis task, or analyzing the corresponding region of interest in the video picture based on the adjusted coordinates and the bound configuration parameters of each frame on the canvas to obtain an image analysis result.

In this embodiment, after the coordinates of each frame on the canvas are adjusted, the adjusted coordinates and the bound configuration parameters may be submitted to the task processing platform. The bound configuration parameters may be configuration parameters input through the configuration parameter editing window in the above embodiment, and the task processing platform may be a platform that performs image analysis on the video picture, for example, the platform may be a server, and the platform obtains an adjusted frame according to the adjusted coordinates and determines an area of interest selected by the adjusted image in the video picture, so as to implement image analysis on the area of interest and obtain an image analysis result.

The client can also analyze the corresponding image area in the video picture according to the adjusted coordinates and the bound configuration parameters to obtain an image analysis result.

In practice, in the case that image analysis needs to be performed on a plurality of video frames, the sizes of the plurality of video frames may be the same, and the plurality of video frames may be captured by a ball camera at the same angle for a target scene, for example, images captured for intersections as shown in fig. 1. In this way, multiplexing of drawn frames can be achieved.

Specifically, after each frame is drawn for one of the video pictures, the adjusted coordinates of the frame and the configuration parameters bound to each frame may be submitted to the task processing platform, and then the task processing platform may perform image analysis on the region of interest framed by the frame in each of the plurality of video pictures according to the adjusted coordinates of the frame and the configuration parameters bound to each frame.

For example, as shown in fig. 1, 5 video pictures are obtained from video streams shot at the same intersection, the size of each video picture is consistent, and the picture content is similar, and after respective corresponding frames of 5 zebra stripes are drawn on one of the video pictures, the area where the zebra stripes are located in each video picture can be extracted according to the coordinates of the frame and the binding configuration parameters, and then analysis is performed.

When the technical scheme of the embodiment of the application is adopted, the coordinates of each frame on the canvas can be adjusted according to the proportional relation between the size of the canvas and the size of the video picture, so that the size of the drawn frame is adaptive to the region of interest selected by the required frame on the video picture, therefore, the accuracy of extracting the region of interest to be analyzed according to the frame in the follow-up process can be improved, and more accurate image analysis can be carried out.

In yet another embodiment, the canvas may further include a clear state, which may refer to: and clearing the drawn borders and the configuration parameters bound by the borders, clearing each border on the canvas and the configuration parameters bound by each border when detecting the operation of entering into the clearing state input on the canvas by a user, and entering into the editing state.

In this embodiment, after each frame on the canvas is cleared, the state of the canvas may be initialized to an editing state to provide a user with drawing of a new frame, and specifically, frame drawing of a new video picture may be performed.

When a user needs to start a new image analysis task, a frame drawing task can be newly created, then a new canvas is created, the canvas enters an editing state, and then a corresponding frame set is created for the new frame drawing task, so that the new image analysis task is realized.

In the following, taking the intersection image shown in fig. 1 as an example to be subjected to pedestrian detection in a traffic management scene, an exemplary description is given to the video picture analysis method of the present application, referring to fig. 8, which shows an interface schematic diagram of a frame drawing interface, as shown in fig. 8, the frame drawing interface includes a canvas and a configuration parameter editing window, and the method includes the following processes:

and S1, newly establishing a pedestrian detection task, and acquiring a video picture from the received video stream.

And S2, loading the video picture onto the canvas, and simultaneously determining the coordinate conversion ratio between the canvas and the video picture according to the real width and height of the video picture and the real width and height of the canvas.

And S3, the canvas defaults to an editing state, a mouse event is initialized, a user clicks a selected coordinate point, and double-click is performed to draw a frame.

And S4, after the drawing is finished, the client pops up a configuration parameter editing column and displays a drawn frame on the canvas, wherein the configuration parameter editing column is shown in figure 8, the interface on the right side is a configuration parameter editing window, and configuration parameters such as lane functions, road types, traffic flow directions and the like can be configured in the configuration parameter editing window.

S5, the client can respond by clicking the saving drawing control, the configuration parameters input by the user are bound on the drawn frame, and if the user clicks to cancel the saving, the drawn frame is not bound with any configuration parameters.

S6, the user switches the canvas state to a selection state, after any one frame can be selected, the selected frame is highlighted, the user can sense whether the frame to be edited and modified is selected, the user can edit and delete the selected frame, and the client side can store the frame modified by the user.

And S7, clicking and storing by a user, responding by the client, processing the coordinates of the frame according to the coordinate conversion proportion acquired in the step S2, and submitting the processed coordinates and the bound configuration parameters to a subsequent algorithm bin, wherein the algorithm bin can be positioned in the client or a server connected with the client.

And S8, when the user clicks to clear, the client can respond to the clear, delete all drawn frames, then automatically switch back to the editing state, and restart the drawing of all frames.

By adopting the technical scheme of the embodiment of the application, the following advantages can be achieved:

on the one hand, because this application can establish a plurality of frames respectively on a plurality of layers, so, when redrawing one of them frame, just need delete this frame on the layer of this frame to redraw can, and other layers then can not receive the influence, so, shortened greatly when having a plurality of frames to draw the condition of task, the user is to the frame length of time of editing the modification, optimized user experience, also improved the efficiency of frame editing and modification, and then improved image analysis efficiency again.

On the other hand, because the plurality of frames are respectively established on the plurality of layers, each frame can be independently selected, so that deletion of a single frame can be realized, and when each frame is modified, configuration parameters and editing configuration parameters can be rebinding for the independently selected frame, so that the configuration parameter setting of the region of interest framed by the frame is facilitated, and further, more image analysis requirements are met.

On the other hand, the drawn frame or the coordinate set meeting the frame drawing condition can be stored, so that the drawn frame or the coordinate set can be multiplexed with other video pictures with similar contents, and the efficiency of image analysis on the video pictures is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Referring to fig. 9, a block diagram of a video frame analysis apparatus according to an embodiment of the present application is shown, and as shown in fig. 9, the apparatus may specifically include the following modules:

a picture acquiring module 901, configured to acquire a video picture in a video stream;

a layer creating module 902, configured to, in response to obtaining multiple coordinate sets, create multiple layers corresponding to the multiple coordinate sets one to one on a canvas corresponding to the video picture, where each coordinate set is used to draw a border of a region of interest;

a frame drawing module 903, configured to draw and display a frame of the region of interest corresponding to each coordinate set on the map layer corresponding to the coordinate set;

and the picture analysis module 904 is configured to analyze the drawn interesting regions framed by the borders of the interesting regions in the video picture, respectively, so as to obtain an image analysis result of the video picture.

Optionally, each of the frames is bound with a configuration parameter, where the configuration parameter is obtained according to a configuration parameter configuration operation of a user on each of the frames, and the picture analysis module 904 is specifically configured to analyze, based on the configuration parameter bound to each of the frames, an area of interest framed by each of the frames in the video picture, respectively, to obtain an image analysis result.

Optionally, the apparatus may further include the following modules:

a storage module, configured to store the coordinate sets and the frames;

the picture acquiring module 901 is further configured to acquire a next video picture of the video pictures after obtaining an image analysis result of the video pictures;

the frame multiplexing module is used for determining the region of interest framed and selected by each of the plurality of frames in the next video frame according to the plurality of coordinate sets and the plurality of frames stored in advance under the condition that the similarity between the frame content of the video frame and the frame content of the next video frame is greater than a preset threshold value;

the picture analysis module 904 is further configured to analyze the regions of interest framed by the plurality of frames in the next video picture, respectively, to obtain an image analysis result of the next video picture.

Optionally, the apparatus further comprises:

a data storage module, configured to store configuration parameters to which the coordinate sets, the borders, and the borders are respectively bound, where the configuration parameters are obtained according to a configuration operation performed by a user on each of the borders;

the picture analysis module 904 is further configured to, when a similarity between a picture content of an image area of the video picture and a picture content of an image area of a next video picture is greater than a preset threshold, determine, according to the plurality of coordinate sets and the plurality of borders stored in advance, an area of interest framed by each of the plurality of borders in the next video picture;

Optionally, the apparatus may further include the following modules:

the first binding module is used for generating an identifier of each frame in the plurality of frames and binding the identifier with the frames;

the editing window creating module is used for creating and displaying the configuration parameter editing window of each frame;

the configuration parameter obtaining module is used for obtaining the configuration parameters input by the user through the display configuration parameter editing window;

the configuration parameter configuration module comprises:

a configuration parameter obtaining unit, configured to obtain a configuration parameter input by a user through the display configuration parameter editing window;

and the configuration parameter binding unit is used for binding the configuration parameters of each frame with the frame in response to the saving operation of a user on each frame in the plurality of frames.

Optionally, the state of the canvas comprises at least an edit state; the layer creating module 902 may be specifically configured to, in a case that the state of the canvas is an editing state, in response to the obtained multiple coordinate sets, create multiple layers corresponding to the multiple coordinate sets one to one on the canvas corresponding to the video picture.

Optionally, the state of the canvas comprises at least a selection state; the apparatus may further include the following modules:

the frame determining module is used for determining the selected frame according to the current coordinate of the input equipment on the canvas under the condition that the state of the canvas is the selected state;

and the response module is used for setting the selected frame into an editable state according to the identifier of the selected frame, and/or reading the configuration parameters bound with the selected frame and displaying the configuration parameters in the configuration parameter editing window, wherein the configuration parameters displayed in the configuration parameter editing window are in the editable state.

Optionally, when the state of the canvas is a selected state, the border determining module includes:

the number detection unit is used for detecting the number of the frames where the current coordinates are located under the condition that the state of the canvas is a selection state;

a first determining unit, configured to determine, as the selected border, the border in which the current coordinate is located when the number of borders in which the current coordinate is located is one;

a second determining unit, configured to determine, when the number of frames in which the current coordinate is located is at least two, an overlapping degree between at least two frames in which the current coordinate is located;

and the third determining unit is used for determining one frame of the at least two frames as the selected frame according to the overlapping degree.

Optionally, the third determining unit includes:

the first determining subunit is used for determining a completely covered border as the selected border under the condition that the overlap degree indicates that one border is completely covered by other borders;

and the second determining subunit is used for determining the central coordinates of the mutual partial overlapping areas under the condition that the overlapping degree characterizes that the at least two frames are mutually partially overlapped, and determining the selected frame according to the respective distances between the central coordinates of the at least two frames and the central coordinates of the mutual partial overlapping areas.

Optionally, the state of the canvas further comprises an empty state; the apparatus may further include the following modules:

and the emptying module is used for emptying each frame on the canvas and entering an editing state when the operation of entering the emptying state, which is input on the canvas by a user, is detected.

Optionally, the coordinate set obtaining module 901 may specifically include the following units:

the monitoring unit is used for monitoring a coordinate acquisition event on the canvas;

the coordinate adding unit is used for collecting corresponding coordinates according to the monitored coordinate collection event, adding the coordinates which are not added with any coordinate set in the collected coordinates into the current coordinate set until a coordinate collection stopping event triggered by the input equipment on the canvas is monitored;

the drawing unit is used for sequentially connecting all coordinates in the current coordinate set according to the sequence of adding the coordinates into the current coordinate set to obtain a frame to be detected;

the verification unit is used for performing polygon verification on the frame to be detected;

and the determining unit is used for taking the frame to be detected as a coordinate set of the frame to be drawn under the condition that the frame to be detected passes the polygon verification.

It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points. The video picture analysis device in the embodiment of the present application may be located in a client or in a terminal device.

The embodiment of the present application further provides an electronic device, which may be configured to execute the video picture analysis method and may include a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the video picture analysis method according to the embodiment of the present application.

Embodiments of the present application further provide a computer-readable storage medium storing a computer program for enabling a processor to execute the video picture analysis method according to the embodiments of the present application.

The embodiment of the present application further provides a computer program product, which includes a computer program or computer instructions, and the computer program or the computer instructions, when executed by a processor, implement the video picture analysis method according to the embodiment of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The video picture analysis method, apparatus, device, medium and program product provided by the present application are introduced in detail, and a specific example is applied to illustrate the principle and implementation manner of the present application, and the description of the above embodiment is only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for video picture analysis, the method comprising:

acquiring a video picture in a video stream;

2. The method according to claim 1, wherein each of the frames is bound with configuration parameters, and the configuration parameters are obtained according to configuration operations of a user on the configuration parameters of each of the frames; the analyzing the drawn frames of the plurality of interesting regions respectively in the interesting regions framed and selected in the video picture to obtain the image analysis result of the video picture comprises:

3. The method according to claim 1, wherein after obtaining a plurality of coordinate sets satisfying the bounding box drawing condition, the method further comprises:

storing the plurality of coordinate sets and the plurality of borders;

4. The method of claim 2, further comprising:

5. The method of claim 2, further comprising:

creating and displaying a configuration parameter editing window of each frame;

6. The method of any of claims 1-5, wherein the state of the canvas comprises at least an edit state; in response to the acquired coordinate sets, creating a plurality of image layers corresponding to the coordinate sets one to one on a canvas corresponding to the video picture, including:

7. The method of claim 6, wherein the state of the canvas comprises at least a selection state; the method further comprises the following steps:

8. The method of claim 7, wherein in the case that the state of the canvas is a selected state, determining the selected border according to current coordinates of an input device on the canvas comprises:

9. The method of claim 8, wherein determining one of the at least two frames as the selected frame according to the overlapping degree comprises:

10. The method of any of claims 6-9, wherein the state of the canvas further comprises an empty state; the method further comprises the following steps:

11. The method according to claim 1, wherein the coordinate set satisfying the bounding box drawing condition is obtained by:

monitoring a coordinate acquisition event on the canvas;

performing polygon verification on the frame to be detected;

12. A video picture analysis apparatus, characterized in that the apparatus comprises:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing implements the video picture analysis method according to any of claims 1-11.

14. A computer-readable storage medium storing a computer program for causing a processor to execute the video picture analysis method according to any one of claims 1 to 11.

15. A computer program product comprising a computer program or computer instructions, characterized in that the computer program or computer instructions, when executed by a processor, implement the video picture analysis method according to any of claims 1-11.