CN113657308A

CN113657308A - Data labeling method and device, computer equipment and storage medium

Info

Publication number: CN113657308A
Application number: CN202110963207.9A
Authority: CN
Inventors: 侯欣如; 刘浩敏; 姜翰青; 王楠; 盛崇山
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-16

Abstract

The present disclosure provides a data labeling method, apparatus, computer device and storage medium, wherein the method comprises: generating a region space model corresponding to the target object; wherein the region space model comprises: the positions of the plurality of sub-region space models and the positions of the plurality of sub-region space models respectively correspond to each other; controlling an image acquisition device to acquire an image of the target object based on the region space model to obtain a video of the target object; acquiring attribute labeling data of the video based on the video; and generating target acquisition data based on the attribute labeling data and the video.

Description

Data labeling method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data annotation method, an apparatus, a computer device, and a storage medium.

Background

When the device is managed, a digital asset corresponding to the device may be generated by performing data annotation on a captured image of the device. However, for tower-type, column-type or rod-type devices such as high-voltage towers, it is difficult to completely capture images of the devices when the images are acquired due to the high height of the devices, and thus it is difficult to label data.

Disclosure of Invention

The embodiment of the disclosure at least provides a data annotation method, a data annotation device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a data annotation method, including: generating a region space model corresponding to the target object; wherein the region space model comprises: the positions of the plurality of sub-region space models and the positions of the plurality of sub-region space models respectively correspond to each other; controlling an image acquisition device to acquire an image of the target object based on the region space model to obtain a video of the target object; acquiring attribute labeling data of the video based on the video; and generating target acquisition data based on the attribute labeling data and the video.

In this way, the area space models corresponding to the target object are generated by using the poses respectively corresponding to the sub-area space models, the image acquisition device can be controlled to acquire the image of the target object by reaching the sub-area space models according to the poses respectively corresponding to the sub-area space models, and therefore the target object can be shot more completely. When the acquired video is used for data annotation, the target acquisition data can be generated easily and accurately.

In an optional embodiment, the generating a region space model corresponding to the target object includes: detecting that the image acquisition equipment reaches a preset position relative to the target object, and acquiring first position and attitude information of the image acquisition equipment; generating the region space model based on first pose information of the image acquisition device.

In an alternative embodiment, the detecting that the image capturing device reaches a preset position relative to the target object includes: controlling the image acquisition equipment to move above the target object to acquire an overhead view image of the target object; the target object is located in a preset area in the overhead view image, and the fact that the image acquisition equipment reaches a preset position relative to the target object is determined.

Therefore, the position of the image acquisition equipment can be adjusted under the condition of accurately determining the pose of the image acquisition equipment by controlling the image acquisition equipment to acquire the overlook image so as to determine the mode that the image acquisition equipment reaches the preset position relative to the target object, so that the image acquisition equipment is controlled to reach the preset position. Because the shooting angle of the image acquisition equipment to the target object can be accurately controlled, videos with complete angles and complete shooting ranges can be acquired by the image acquisition equipment aiming at the target object; such video can better show the structure and characteristics of the target object, and therefore, the subsequent labeling of the target object by the video is also more beneficial.

In an optional embodiment, the generating the region space model based on the first pose information of the image capturing device includes: determining a bounding box corresponding to the target object in the overhead view image; determining a projection position of the region space model in the overhead image based on the position of the bounding box in the overhead image; determining poses of the plurality of sub-region space models of the region space model respectively corresponding to the projection position of the region space model in the overhead view image, first pose information of the image acquisition equipment when the overhead view image is acquired, and parameter information of the target object; and determining the region space model based on the poses respectively corresponding to the plurality of sub-region space models.

In this way, the specific position of the target object can be described more accurately by looking down the bounding box corresponding to the target object shown in the image; in addition, the surrounding frame can describe the specific position of the target object more accurately, so that the projection position of the region space model is determined by using the position of the surrounding frame, the region space model is corrected by using the pose of the image acquisition equipment, the specific information of the region space model is determined by using the parameter information of the target object, and the obtained region space model can include the target object; when the regional space model is used for collecting images of the target object, the video of the target object can be collected more completely.

In an optional embodiment, the parameter information of the target object includes a height of the target object; the determining, based on the projection position of the region space model in the overhead view image, the first pose information of the image acquisition device when acquiring the overhead view image, and the parameter information of the target object, poses corresponding to a plurality of sub-region space models of the region space model respectively includes: determining an annular cylindrical model surrounding the target object based on the projection position of the region space model in the overhead view image, first attitude information of the image acquisition equipment when the overhead view image is acquired, and the height of the target object; dividing the annular cylindrical surface model based on the size of a preset subregion space model to obtain a plurality of subregion space models; and determining the position of each sub-region space model in the annular cylindrical surface model and the position and posture of the annular cylindrical surface model according to the position of each sub-region space model in the annular cylindrical surface model and the position and posture of the annular cylindrical surface model.

Therefore, the relatively accurate regional space model and the preset size of the sub-regional space model are utilized, the sub-regional space model with the same size can be relatively accurately determined, and the image acquisition equipment can be more favorable for carrying out complete and clear image acquisition on a plurality of regions in the target object by utilizing a plurality of sub-regional space models in the regional space model.

In an optional embodiment, the controlling, based on the area space model, an image capturing device to capture an image of the target object to obtain a video of the target object includes: and controlling the image acquisition equipment to move around the target object based on the poses respectively corresponding to the plurality of sub-area space models in the area space model, and controlling the image acquisition equipment to acquire a first video of the target object in the process of reaching image shooting areas respectively corresponding to the plurality of sub-area space models.

In an optional embodiment, the controlling the image capturing device to capture the image of the target object based on the region space model includes: detecting whether a position range corresponding to a target subregion space model which is not reached exists or not based on poses respectively corresponding to a plurality of subregion space models in the region space model and second pose information of the image acquisition equipment when the first video is acquired; and in response to the fact that the position range corresponding to the target sub-region space model which is not reached exists, controlling the image acquisition equipment to move to the position range corresponding to the target sub-region space model based on the pose of the target sub-region space model, and acquiring a second video.

Therefore, when the first video is acquired, the area which cannot be accurately acquired by the target object can be determined by detecting the sub-area space model which is not reached by the image acquisition equipment, and the second video is acquired by acquiring the area again, so that the video can be supplemented to the part which cannot be acquired by the target object in the first video in a mode of supplementing the second video, and the video with more complete shooting angle and shooting range for the target object can be obtained. When the second video is obtained, the image acquisition equipment can be accurately controlled to move to the position range corresponding to the target sub-region space model through the pose based on the target sub-region space model, the image acquisition efficiency is improved, and the second video with complete and clear angles can be obtained more easily. And moreover, the video obtained after the second video is supplemented can better show the structure and the characteristics of the target object, so that the subsequent labeling of the target object by using the video is more facilitated.

In an optional embodiment, the obtaining attribute annotation data for the video based on the video includes: determining key frame images from the video; and generating attribute marking data of the video based on marking data obtained by attribute marking in response to attribute marking of the object to be marked in the key frame image.

In an optional embodiment, the obtaining attribute annotation data for the video based on the video includes: displaying key frame images in the video; and responding to a first labeling operation on the object to be labeled in the key frame image, and generating labeling data corresponding to the first labeling operation.

In this way, by determining the key frame image from the video, the target object can be labeled by using the image, so that fewer images need to be processed during labeling, and the labeling is simpler and more convenient. Meanwhile, as the key frame image is the video frame image in the video, the key frame image can be synchronized to other video frame images in the video without labeling all the video frame images in the video frame by frame, and the efficiency of data labeling can be effectively improved.

In an optional implementation manner, the performing attribute annotation on the object to be annotated in the key frame image includes: and performing semantic segmentation processing on the key frame image, and generating annotation data of the object to be annotated based on the result of the semantic segmentation processing.

Therefore, the method can realize automatic labeling of the objects to be labeled in a semantic segmentation processing mode, and is simpler and more convenient.

In an optional implementation manner, the performing attribute annotation on the object to be annotated in the key frame image includes: generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image; responding to a second labeling operation on the object to be labeled in the preview image, and generating labeling data corresponding to the preview image; and obtaining the annotation data of the key frame image based on the annotation data corresponding to the preview image.

In this way, since the preview image has a lower resolution than the key frame image, the amount of data that needs to be transferred when presenting the preview image to the user is less than the key frame image. Due to the fact that the transmission speed is increased, the image which can be used for data annotation can be displayed to the user more quickly, and therefore the data annotation speed can be increased correspondingly.

In an optional embodiment, the generating target capture data based on the attribute labeling data and the video includes: for any frame of video frame image in the video, in response to detecting that the frame of video frame image is a non-key frame image, determining a target key frame image corresponding to the frame of video frame image from the key frame images; and generating the annotation information of the frame of video frame image based on the annotation information of the target key frame image.

Therefore, by determining the corresponding target key frame image for the video frame image, the annotation data in the target key frame image can be correspondingly synchronized to other video frame images in the video, so that less data need to be annotated, but the annotation of the data of all the video frame images in the video is completed faster and the efficiency is higher.

In an alternative embodiment, the determining a target key frame image for the frame of video frame image from the key frame images includes: and determining the target key frame image for the frame video frame image from the key frame images based on the first position of the key frame image in the video and the second position of the frame video frame image in the video.

Therefore, the mode of determining the target key frame image according to the position of the image can perform data synchronous labeling on the video frame image of the non-key frame image more accurately in the follow-up process.

In an optional embodiment, the method further comprises: acquiring second position and posture information when the image acquisition equipment acquires the video; generating target collection data based on the attribute labeling data and the video, comprising: and generating the target acquisition data based on the attribute labeling data, the video and second attitude information when the image acquisition equipment acquires the video.

Therefore, the attribute marking data can be corresponding to the target object in the video by utilizing the second posture information when the video is collected by the image collecting equipment, and the attribute marking data is stored or displayed in a correlation manner according to the determined relative position relation; therefore, in the obtained target acquisition data, the attribute marking data can be associated with the specified position in the video, so that the attribute marking data is more strongly associated with the target object in the video and the corresponding relation is more definite during storage and display.

In a second aspect, an embodiment of the present disclosure further provides a data annotation device, including: the generating module is used for generating a region space model corresponding to the target object; wherein the region space model comprises: the positions of the plurality of sub-region space models and the positions of the plurality of sub-region space models respectively correspond to each other; the control module is used for controlling the image acquisition equipment to acquire images of the target object based on the region space model so as to obtain a video of the target object; the acquisition module is used for acquiring attribute labeling data of the video based on the video; and the first processing module is used for generating target acquisition data based on the attribute labeling data and the video.

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the data annotation device, the computer device, and the computer-readable storage medium, reference is made to the description of the data annotation method, which is not repeated herein.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a data annotation method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a target object provided by an embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of a graphical display interface displaying a top view image of a target object provided by an embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a specific method for determining poses corresponding to a plurality of sub-area space models of an area space model according to an embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of a determined toroidal cylinder model provided by embodiments of the present disclosure;

FIG. 6 is a schematic diagram of a sub-region spatial model provided by an embodiment of the present disclosure;

fig. 7 is a schematic diagram illustrating a plurality of sub-region space models corresponding to an annular cylindrical surface model after the annular cylindrical surface model is divided by using a preset sub-region space model size according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating a spatial model of a display area using a graphical display interface provided by an embodiment of the present disclosure;

FIG. 9 is a diagram illustrating a graphical display interface for displaying a key frame image and an annotation control according to an embodiment of the disclosure;

FIG. 10 is a schematic diagram illustrating a graphical display interface displaying preview images provided by an embodiment of the present disclosure;

FIG. 11 is a flow chart illustrating one embodiment of the present disclosure for performing data annotation;

fig. 12 shows a flowchart corresponding to an embodiment of performing data annotation on a machine room according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a data annotation device provided in an embodiment of the present disclosure;

fig. 14 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Research shows that when tower-type, column-type or rod-type equipment such as a high-voltage electric tower is subjected to data annotation to generate digital assets, due to the fact that the equipment is high in height, a worker usually adopts a mode that the equipment is shot by facing upward at the equipment to acquire images, due to the influence of the pose of the image acquisition equipment, the target object cannot be shot completely, and the acquired images are difficult to label.

Based on the research, the present disclosure provides a data labeling method, in which an image acquisition device is controlled to acquire an image of a target object through a generated region space model corresponding to the target object, so that the target object can be more completely photographed. Therefore, the acquired video is used for data annotation, and the target acquisition data can be generated easily and accurately.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, a data annotation method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the data annotation method provided in the embodiments of the present disclosure is generally a data processing device with certain computing capability, and the data processing device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device (e.g., a tablet or a cell phone in the examples described below), a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or a server or other processing device; in a possible case, the target object can also be equipped with a dedicated data processing device, for example a management computer dedicated to managing the annotation data of the target object, or a portable handheld management device. Specifically, the determination may be performed according to actual situations, and details are not described herein. In addition, in some possible implementations, the data annotation method can be implemented by a processor calling computer-readable instructions stored in a memory.

The data annotation method provided by the embodiment of the present disclosure is explained below.

Referring to fig. 1, a flowchart of a data annotation method provided in the embodiment of the present disclosure is shown, where the method includes steps S101 to S104, where:

s101: generating a region space model corresponding to the target object; wherein the region space model comprises: when image acquisition equipment acquires images of a target object, a plurality of sub-area space models and a plurality of poses corresponding to the sub-area space models are required to be reached;

s102: controlling an image acquisition device to acquire an image of the target object based on the region space model to obtain a video of the target object;

s103: acquiring attribute labeling data of the video based on the video;

s104: and generating target acquisition data based on the attribute labeling data and the video.

The following describes the details of S101 to S104.

With respect to S101 described above, the target object in the embodiment of the present disclosure may include, for example, a tower, a column, a pole type device or a building having a high height. Illustratively, in an industrial scenario, the target object may include, for example, at least one of: high-voltage power tower, communication power tower, monitoring tower, electric power tower and wind driven generator. In addition, in other scenarios, the target object may also be a building having a higher height, such as a tall building, a sightseeing tower, a building, and the like. For different target objects, the corresponding heights are different except for the different structures. For example, in the case where the target object includes a high-voltage electric tower, the height thereof is, for example, in a range of height from 25 meters to 50 meters; for the case where the target object comprises a wind turbine, the height thereof is, for example, in the range of 65 to 70 meters. The specific height may be determined according to the actual height of the target object, and is not limited herein.

In one possible case, the target object that can be subjected to data annotation in one area may include a plurality of objects, for example, a plurality of wind power generators in one area. Because the target objects for data annotation are similar, different target objects can be correspondingly annotated by using the data annotation method provided by the embodiment of the disclosure. For example, when performing data annotation on a plurality of target objects, whether data annotation has been performed on the target object or whether the target object has completed data annotation or whether the target object has data to be annotated may be determined according to identification information corresponding to different target objects, so as to determine whether the target object is an object to be annotated in the area.

Illustratively, a specific scenario of data annotation is also provided in the embodiments of the present disclosure. In this specific scenario, when the image capturing device performs image capturing, a specific target device may be, for example, as shown in fig. 2, where fig. 2 is a schematic diagram of a target object provided in the embodiment of the present disclosure; in fig. 2, (a) shows a front view of the target object, and in fig. 2, (b) shows a top view of the target object.

In the elevation view of fig. 2 (a), a plurality of regions of the target object are shown, including: a tower 21, a lookout hole 22 in the tower (the lookout hole may be shown only in the front view), a sightseeing station 23, a equipment room 24, and a tower tip 25 (the tower tip 25 may be considered as a cone). In the plan view shown in fig. 2 (b), corresponding to fig. 2 (a), it can be determined that the outermost contour line 26 is a plan view corresponding to the sightseeing stand 23, the contour line 27 is a plan view corresponding to the equipment room 24, and the contour line 27 is a plan view corresponding to the tower tip 25. Wherein the position of the apex is also shown in the contour 27, since the cone tip 25 is considered to be a cone. In addition, for convenience of illustrating the correspondence relationship, (b) in fig. 2 also shows the numbers of the regions corresponding to the respective portions in the plan view by the numbers with parentheses.

For different regions of the target object, the data for labeling the different regions of the target object are also different due to the difference in the structure, specific function, and the like of the different regions. In order to label data in different areas of a target object, the data labeling method adopted in the embodiment of the disclosure can perform image acquisition with a complete angle on the target object by controlling an image acquisition device, and then perform data labeling by using a video obtained by image acquisition. In order to obtain a video with a complete angle for a target object when an image is acquired, that is, a video of each region in the target object may be shown, an embodiment of the present disclosure controls an image acquisition device to acquire a target space by generating a region space model corresponding to the target object. Wherein the "region" in the region space model is different from the "region" in the target object: the "region" in the target object is obtained by dividing position regions with different functions included in the target object, for example, and the "region" in the region space model is a finer-grained region divided for controlling the image capturing device to capture images of different positions of the target object, and a plurality of "regions" in the region space model may correspond to the same region of the target object, for example, 2 "regions" in the region space model correspond to watchholes, or 5 "regions" in the region space model correspond to tower tips at different angles, so that the tower tips can be captured in all directions through the region space model.

In the embodiment of the present disclosure, a specific embodiment of data annotation for a target object is provided; in this embodiment, for example, the following steps (r) to (sixty):

the method comprises the following steps: the target object type is determined.

When the user is prompted on the graphical user interface of the data processing device that the data processing device proceeds to this step (r), for example, step a: a target object type is selected.

In this step (r), the data processing apparatus may determine a target object type of the target object, for example, in response to a manual input operation by a user, or may also determine a target object type corresponding to the current target object from a plurality of object types that are candidates, in response to a selection operation by the user.

After determining the type of the target object, for example, parameter information of the target object, such as the device name and height of the target object, may also be determined accordingly.

For example, when the data processing apparatus selects the target object type shown in fig. 2, for example, the data processing apparatus may determine that the target object type of the target object is tower equipment, and correspondingly determine that the equipment name of the target object is "monitoring tower" and the height is 30 meters.

Secondly, the step of: and controlling the image acquisition equipment to acquire the preset position of the target object.

When the step (ii) is prompted to the user on the graphical user interface of the data processing device, for example, step (B): flying above the target object.

In this step (ii), specifically, when generating the corresponding region space model for the target object, for example, the following manner may be adopted: detecting that the image acquisition equipment reaches a preset position relative to the target object, and acquiring first position and attitude information of the image acquisition equipment; generating the region space model based on first pose information of the image acquisition device.

The image acquisition device may comprise, for example, an unmanned aerial vehicle. Because the unmanned aerial vehicle has limited computing power, the unmanned aerial vehicle can be used as an image acquisition device only, and the image acquired by the unmanned aerial vehicle can be further processed by a data processing device such as a mobile phone and a special device connected with the unmanned aerial vehicle.

In addition, when the first position and orientation information of the unmanned aerial vehicle is obtained, for example, a carrier-time kinematic (RTK) technique can be adopted, and the first position and orientation information of the unmanned aerial vehicle can be determined efficiently and accurately by receiving the carrier phase collected by the reference station and performing a difference calculation to obtain the coordinate. The calculation method is not described in detail herein.

In particular, the unmanned aerial vehicle may communicate with the data processing device, for example, relying on a network connection. The network connections that may be relied upon may include, for example, a Fiber Ethernet Adapter (Fiber Ethernet Adapter), a mobile communication technology (e.g., a fourth generation mobile communication technology (4G) or a fifth generation mobile communication technology (5G)), and Wireless Fidelity (WiFi); the data processing device may for example also comprise a computer device as explained above.

In a specific implementation, when controlling the image capturing device to reach a preset position relative to the target object, for example, the following manner may also be adopted: controlling the image acquisition equipment to move above the target object to acquire an overhead view image of the target object; the target object is located in a preset area in the overhead view image, and the fact that the image acquisition equipment reaches a preset position relative to the target object is determined.

For example, in practical applications, the data processing device may include a mobile phone, and the image capturing stylus may include an unmanned aerial vehicle, for example. When the mobile phone controls the unmanned aerial vehicle to fly, the relative operation buttons which can control the unmanned aerial vehicle by using the mobile phone are used for controlling the unmanned aerial vehicle to fly, and the image collected and returned by the unmanned aerial vehicle in real time is displayed on the graphical display interface of the mobile phone. In this way, when the unmanned aerial vehicle is controlled to fly above the target object, the overhead image of the target object can be displayed to the user on the graphical display interface for the user to view.

Referring to fig. 3, a schematic diagram of a graphical display interface when displaying a top view image of a target object is provided for the embodiment of the present disclosure. In fig. 3, a step display area 31 is shown, and a step a (corresponding to the step (r)) that has already been performed and a step B (corresponding to the step (r)) that is currently being performed are shown in the step display area 31, so that the progress of data annotation can be directly shown to the user on the sub-graphical display interface. Specifically, in the graphical display interface shown in fig. 3, a top view image taken by the unmanned aerial vehicle is also shown, which is displayed in full screen, so that the user can more easily and clearly view the target object through the graphical display interface. For the sake of comparison, the top view 26 of the sightseeing stand 23 is labeled correspondingly in the figure.

Here, in order to prompt a user such as a staff to perform the operation related to step B, a prompt message "please manually control the unmanned aerial vehicle to fly above the tower top" may be further shown on the graphical display interface.

In addition, in order to determine that the unmanned aerial vehicle can acquire the overhead view image directly above the target object, for example, an indication icon 32 and a corresponding prompt message "please make the tower tip appear in the circle" may be displayed on the graphical display interface to assist a user such as a worker to refer to the indication icon and limit the shooting area of the unmanned aerial vehicle to the area corresponding to the tower tip.

An indication icon 33 and corresponding prompt information 'please make the whole tower appear in the circle' are also displayed on the graphical display interface to assist users such as staff and the like to refer to the indication icon and control the unmanned aerial vehicle to be limited at a certain flight altitude through the data processing equipment, so that the unmanned aerial vehicle can clearly and completely look down the tower and acquire images. In a possible case, due to the limitation of the shooting angle and the shooting range of the unmanned aerial vehicle, if the height of the target object is 30 meters, the unmanned flight chess can acquire the target object in an overlooking mode within the height range of 32 meters to 35 meters to obtain a complete and clear overlooking image displayed on the target object, the overlooking image of the edge part can be lost within the height range of 30 meters to 32 meters, and the overlooking image of the target object in the overlooking image is smaller due to the height of more than 35 meters, so that the target object is unclear. Therefore, by using the indication icon 33, the data processing device can avoid the problems of missing acquisition and unclear acquisition described above when controlling the image acquisition device to acquire an image of the target object, so that the image acquisition device can acquire a complete and clear overhead view image of the target object.

Here, in order to distinguish the instruction icon 32 and the instruction icon 33 from the top view image of the target object, the instruction icon 32 and the instruction icon 33 are displayed in bold in fig. 3.

In addition, on the graphical display interface, an information display box 34 that displays the status information of the current unmanned aerial vehicle may be further included, and in this information display box 34, for example, the current flight altitude of the unmanned aerial vehicle and the device attitude (here, the device attitude includes the first attitude information of the image capturing device, and the second attitude information described below) may be displayed. Similarly, a regional map 35 may also be displayed along with the current flight position of the unmanned aerial vehicle to ensure safe flight of the unmanned aerial vehicle.

In one possible embodiment, the data processing apparatus may determine whether the target object is located in a preset area in the overhead image, for example, in response to a confirmation operation by the user. For example, the user may automatically confirm whether the current unmanned aerial vehicle performs image acquisition directly above the target object according to the acquired image and the indication icon shown in the graphical display interface, and enable the data processing device to determine that the target object is located in the preset area in the overhead view image according to the confirmation operation through the corresponding confirmation operation of the user. Alternatively, the data processing apparatus may also automatically perform image recognition on the target object shown in the captured image, by determining whether the tip portion of the target object is located within the indication icon 32 and whether the target object is located within the indication icon 33, thereby obtaining a detection result that the target object is located in the preset area.

Referring to fig. 3, for example, a control 36 for confirming that the detection target object is located in a preset area in the overhead view image may be further indicated on the graphical display interface, and in a case that the user determines that the currently-captured overhead view image of the unmanned aerial vehicle meets the requirement, for example, the operation of ending the collection of the preset position of the target object may be determined by sliding the control 36; the data processing device may respond to the user's confirmation end operation accordingly, and proceed to the operation of the region space model of the corresponding confirmation target object correspondingly.

③: a regional space model of the target object is determined.

When the user is prompted on the graphical user interface of the data processing device that the data processing device has proceeded to step C, for example, step C may be displayed: and confirming the region space model.

In this step (c), the data processing device may generate a region space model, for example, based on the first pose information of the image capturing device. Specifically, for example, the following method may be adopted: determining a bounding box corresponding to the target object in the overhead view image; determining a projection position of the region space model in the overhead image based on the position of the bounding box in the overhead image; determining poses of the plurality of sub-region space models of the region space model respectively corresponding to the projection position of the region space model in the overhead view image, first pose information of the image acquisition equipment when the overhead view image is acquired, and parameter information of the target object; and determining the region space model based on the poses respectively corresponding to the plurality of sub-region space models.

After the target object is determined to be located in the preset area in the top view image through the second step, the bounding box corresponding to the target object can be correspondingly determined in the top view image. Here, the bounding box may include, for example, a minimum bounding box corresponding to the target object. In one possible case, since there is a step of determining to frame the target object with the indication icon 33 in the step (ii), if the projection contour of the target object in the overhead image in the step (ii) can be represented by the indication icon 33, the contour of the indication icon 33 can be correspondingly used as the bounding box corresponding to the target object. Therefore, the data processing method directly utilizes the data processing result of the previous step, so that the steps of data processing can be reduced, and the efficiency of data processing can be improved.

In another possible case, if the minimum bounding box of the target object is selected as the bounding box corresponding to the target object, when the target object is framed by the indication icon 33 in step (ii), there may be a case where the projection outline of the target object is framed inside by the indication icon 33, for example, in fig. 3, the top view corresponding to the sightseeing stand 23 (i.e. the contour line 26 marked in the figure) represents the projection outline of the target object, and in step (ii), the indication icon 33 frames the contour line 26, but cannot be the smallest bounding box of the contour line 26, so when the bounding box corresponding to the target object is determined in the top view image, the projection outline of the target object needs to be identified to obtain the minimum bounding box of the target object.

Specifically, in this case, for example, the projected contour of the target object may be recognized by image recognition, and the contour line of the projected contour may be determined according to the recognition result. In general, since, in the case of determining the type of the target object, a determined shape of the projection contour of the target object, for example, a circle, may be determined accordingly, after determining the projection contour, a corresponding bounding box may be determined for the target object according to the determined shape.

In addition, when the target object is shot, even if the unmanned aerial vehicle can be ensured to be capable of shooting the image in an overlooking mode above the target object as much as possible by using the

indication icons

32 and 33 in the step II, the situation that the unmanned aerial vehicle can be ensured to be positioned right above the target object in an accurate pose and the unmanned aerial vehicle is shot in a mode that the optical axis is parallel to the perpendicular line of the target object is difficult to ensure, so that the collected overlooking image cannot be completely matched with the top view of the target object due to the influence of the pose of the unmanned aerial vehicle. In this case, when determining the bounding box according to the contour line of the projection contour of the target object, for example, a certain size may be increased on the basis of the contour line, for example, the radius of the contour line is increased by 0.5 meter, and then the bounding box is determined, so as to ensure that the bounding box can completely select the actual projection contour corresponding to the target object.

After the bounding box corresponding to the target object is determined, the position of the bounding box in the top view image can be determined. In addition, the projection position of the corresponding determined region space model in the overhead view image can be determined by the position of the bounding box in the overhead view image.

For example, after determining the position of the bounding box in the overhead view image, the position of the bounding box in the overhead view image may be directly used as the projection position of the area space model in the overhead view image. In this way, the position of the small bounding box in the overhead view image is directly used as the projection position of the area space model in the overhead view image, so that the method is simple and convenient, and the calculation processing amount is reduced. Or on the basis of the determined surrounding frame, the length of the radius is increased, and the contour line of the projection position representation of the area space model in the overlooking image is larger than the radius of the surrounding frame, so that the situation that the unmanned aerial vehicle cannot directly collide with the target object due to the fact that the area space model is close to the target object or is directly attached to the target object when the subsequent unmanned aerial vehicle utilizes the area space model to collect images can be prevented, and flight safety risks exist.

When the projection position of the area space model in the overhead view image is determined, the first pose information of the unmanned aerial vehicle when the overhead view image is obtained can be synchronously obtained, and the poses corresponding to the plurality of sub-area space models of the area space model are determined according to the parameter information of the target object. Wherein the parameter information of the target object comprises a height of the target object. In one possible case, since the height of the target object can be determined accordingly by selecting the type of the target object in the above-described step (r), the acquisition or determination may not be repeated here.

Specifically, referring to fig. 4, a flowchart of a specific method for determining poses corresponding to a plurality of sub-region space models of a region space model according to an embodiment of the present disclosure is shown, where:

s401: and determining an annular cylindrical model surrounding the target object based on the projection position of the region space model in the overhead view image, the first attitude information of the image acquisition equipment when the overhead view image is acquired, and the height of the target object.

Specifically, the projection position of the toroidal cylinder model on the horizontal plane (here, the horizontal plane refers to the installation plane of the target object, for example, the plane where the ground of the area is located) may be determined from the projection position of the area space model in the overhead image, and the perpendicular line of the toroidal cylinder model may be adjusted so that the perpendicular line of the toroidal cylinder model and the perpendicular line of the target object are on the same straight line based on the first attitude information of the unmanned aerial vehicle when acquiring the overhead image. In addition, the height of the toroidal cylindrical model can be determined by the height of the target object. In this way, by determining the projection position, the vertical line, and the height of the toroidal cylindrical surface model on the ground, the form of the toroidal cylindrical surface model can be determined.

Illustratively, referring to fig. 5, a schematic diagram of a determined toroidal cylinder model is provided for an embodiment of the present disclosure. When the projection position of the annular cylindrical surface model is determined, that is, the circle center o and the radius r of the projection contour line marked in (a) in fig. 5 are determined. The vertical line of the toroidal cylindrical model is, for example, a vertical line l marked in fig. 5 (a), and the vertical line l is located on the same straight line as the vertical line corresponding to the target object. The height h of the toroidal cylindrical model is for example 30 meters. In this way, the form of the annular cylindrical model shown in fig. 5 (a), that is, the cylindrical model built with the annular plane, for example, the model without the upper and lower planes of the cylinder, can be obtained.

In addition, a schematic diagram representing the positional relationship between the toroidal cylindrical model and the target object is also shown in fig. 5 (b). As can be seen from fig. 5 (b), the target object may be surrounded by the toroidal cylindrical model.

S402: and dividing the annular cylindrical surface model based on the size of a preset subregion space model to obtain a plurality of subregion space models.

The sub-region space model is, for example, a patch with a certain curvature. Referring to fig. 6, a schematic diagram of a sub-region space model provided in the embodiment of the present disclosure is shown. The sizes of different sub-region space models are the same, but due to different poses, the shapes shown in the forward display are different. For example, the normal vector L1 of the subspace model shown in fig. 6 (a) points to the left, the normal vector of the subspace model shown in fig. 6 (b) points directly in front, and the normal vector L2 of the subspace model shown in fig. 6 (c) points to the right.

The dimensions of the sub-area-space model can be characterized, for example, by the arc length of the sub-area-space model. Taking the sub-region space model shown in fig. 6 (b) as an example, the arc length s1 may be 0.5 meter or 1 meter, and the height s2 may be 0.3 meter or 0.5 meter, which may be determined specifically according to an actual situation or according to an actual size of the annular cylindrical surface model, and will not be described herein again.

Since the size of the sub-region space model can be determined, the annular cylindrical model can be divided by using the determined preset size of the sub-region space model. Exemplarily, referring to fig. 7, a schematic diagram of a plurality of sub-region space models corresponding to a toroidal cylinder model after the toroidal cylinder model is divided by using a preset sub-region space model size is provided for the embodiment of the present disclosure; since the number of the sub-region space models corresponding to the annular cylindrical surface model is large, only the sub-region space model 71, the sub-region space model 72, and the sub-region space model 73 are labeled.

Here, in order to facilitate the illustration of possible poses and forms of the multiple sub-region space models in fig. 7, therefore, adjacent sub-region space models are not shown in a fitting manner, and when the annular cylindrical surface model is actually divided to obtain multiple sub-region space models, the obtained multiple sub-region space models may be, for example, in a close fitting manner.

S403: and determining the pose of each sub-region space model according to the position of each sub-region space model in the annular cylindrical surface model and the pose of the annular cylindrical surface model.

Specifically, after the annular cylindrical surface model is divided, the positions of the plurality of sub-area space models in the annular cylindrical surface model can be determined, and the pose of each sub-area space model can be correspondingly solved through the pose of the annular cylindrical surface model.

In this way, since the plurality of sub-area space models can constitute the area space model, the area space model described above can be determined when the size of the sub-area space model and the posture corresponding to each sub-area space model are determined.

In another embodiment of the present disclosure, referring to fig. 8, a schematic diagram of displaying a region space model using a graphical display interface is also provided. The schematic diagram shown in fig. 8 may be presented to a user on a graphical user interface of the data processing device, for example, to prompt the user of specific information such as the current progress of data annotation and establishment of the obtained regional space model.

Specifically, the currently ongoing step shown in the step display area 31 in fig. 8 is step C: confirming a region space model; in addition, an interface displaying a region space model preview image is also shown, in which the determined region space model 81, and a partial region of the target object and a rendered image 82 of the region space model corresponding to the partial region displayed when the region space model is previewed are shown.

In one possible scenario, the data processing apparatus may provide a more comprehensive picture presentation to the user on the graphical user interface, for example, in response to a drag or the like operation by the user. Specifically, when the user drags the region space model 81, the data processing apparatus may be caused to display different regions of the region space model 81 to the user through the graphical user interface by moving the current preview position marked thereon, so that the user may view the different regions of the region space model 81. And, in response to the user dragging the view region space model 81, the target object corresponding to the region and the region space model may also be presented in the rendered image 82 shown in the graphical user interface, for example, accordingly. In addition, after the user views the region space model 81 and renders the image 82, the user determines that the region space model 81 can completely surround the target object, and slides and triggers the control 83; accordingly, the data processing apparatus may determine that the confirmation operation for the step is completed in response to the change trigger operation, and accordingly turn on the next step.

For the above S102, in the case of determining the region space model corresponding to the target object, for example, the image capturing device may be controlled to capture an image of the target object, so as to obtain a video of the target object.

Here, when performing image capturing to acquire a video of the target object, the data processing device may determine the video of the target object by, for example, controlling the image capturing device to perform image capturing on the target object to acquire a plurality of consecutive video frame images. That is, when the target object is subjected to image acquisition, for example, a plurality of video frame images of the target object may be obtained, and a video of the target object may also be obtained accordingly. The obtained video or video frame image may be used for attribute labeling in the subsequent step S103, which may specifically refer to the following detailed description of S103, and is not described herein again.

Specifically, the step c of carrying out the data annotation for the target object in the specific embodiment further includes the following steps:

fourthly, the method comprises the following steps: and controlling the image acquisition equipment to acquire an image of the target object.

When the user is prompted on the graphical user interface of the data processing device that the data processing device proceeds to the step (iv), for example, step D: and (5) image acquisition.

In this step (iv), specifically, for example, a video of the target object may be acquired in the following manner: and controlling the image acquisition equipment to fly around the target object based on the poses respectively corresponding to the plurality of subarea space models in the area space model, and controlling the image acquisition equipment to acquire a first video of the target object in the process of reaching image shooting areas respectively corresponding to the plurality of subarea space models.

Specifically, the image capturing apparatus including the unmanned aerial vehicle is exemplified. Because the poses of the plurality of sub-area space models in the area space model respectively correspond to each other are determined, the data processing device can control the unmanned aerial vehicle to fly to the image shooting areas respectively corresponding to the plurality of sub-area space models according to the poses of the different sub-area space models respectively, so as to acquire and acquire the first video.

In a possible case, when the unmanned aerial vehicle is located at a position within a certain distance range (for example, within a distance range of 0.5 to 0.7 meters) from the normal vector of the sub-region space model on a straight line where the normal vector of the sub-region space model is located, clear and complete image acquisition can be performed on the sub-region space model, and then when the unmanned aerial vehicle is controlled to fly, the position can be used as an image shooting region corresponding to the sub-region space model. In the embodiment of the disclosure, correspondingly, the unmanned aerial vehicle arrives at the image shooting area corresponding to the sub-area space model, and the unmanned aerial vehicle arrives at the sub-area space model.

Specifically, because the poses corresponding to the multiple sub-area space models are different, the image capturing areas corresponding to different sub-area space models are also different. When the unmanned aerial vehicle is controlled to fly around the target object, because the unmanned aerial vehicle has flown above the target object in the step two, when the first video is acquired, the unmanned aerial vehicle can be controlled to fly to a sub-area space model on a higher layer in the area space model, for example, for a plurality of sub-area space models shown in fig. 7, the unmanned aerial vehicle can be controlled to fly to a position near the sub-area space model according to the position of the sub-area space model 71, and then the position of the unmanned aerial vehicle is adjusted, so that the unmanned aerial vehicle can capture images of the target object in an image capturing area corresponding to the sub-area space model 71.

After the image acquisition of the target object corresponding to the subregion space model of the uppermost layer is completed, the flying height of the unmanned aerial vehicle can be controlled, so that the unmanned aerial vehicle reaches an image shooting area corresponding to the subregion space model of the second layer, for example, an image shooting area corresponding to the subregion space model 73 in fig. 7, and then the unmanned aerial vehicle can acquire the image of the target object in the image shooting area. Therefore, the unmanned aerial vehicle is controlled to reach the image shooting areas corresponding to the sub-area space models in the area space model corresponding to the target object, and the target object is collected in the plurality of image shooting areas, so that the target object can be completely and completely subjected to image collection.

Here, since the plurality of sub-region space models included in the region space model are adjacent in position, a strategy for controlling the unmanned aerial vehicle to fly around the target object, for example, the target object is image-acquired layer by layer from top to bottom through the spiral flight trajectory described above, may be determined according to the adjacent position relationship of the sub-region space models. Therefore, repeated shooting of the image shooting area corresponding to the sub-area space model can be reduced through planning the flight strategy, and therefore the image acquisition task of the target object is efficiently completed.

In addition, in a possible case, due to an operation error that may exist when controlling the unmanned aerial vehicle, or due to the influence of the wind direction, there may exist an area that cannot be acquired when the target object is acquired after the first video is acquired. For example, corresponding to the plurality of sub-area space models shown in fig. 7, when the first video is acquired, the image capturing area corresponding to the sub-area space model 72 is not reached, and accordingly, a complete image obtained by capturing the target object under the image capturing area cannot be acquired in the first video.

In this case, the second video may also be determined, for example, according to the following: detecting whether a position range corresponding to a target subregion space model which is not reached exists or not based on poses respectively corresponding to a plurality of subregion space models in the region space model and second pose information of the image acquisition equipment when the first video is acquired; and in response to the fact that the position range corresponding to the target sub-region space model which is not reached exists, controlling the image acquisition equipment to move to the position range corresponding to the target sub-region space model based on the pose of the target sub-region space model, and acquiring a second video.

When the first video is shot, the second pose information of the unmanned aerial vehicle can be determined, and according to the poses respectively corresponding to the multiple sub-area space models, whether the unmanned aerial vehicle carries out image acquisition in image shooting areas corresponding to all the sub-area space models in the area space model, namely whether a position range corresponding to the target sub-area space model which is not reached exists or not can be determined, and the target sub-area space model can be correspondingly determined. In a possible case, the determined target sub-area space models may exist in a plurality and are distributed at different positions of the area space model, so that the flight planning route of the unmanned aerial vehicle can be determined for obtaining the second video according to the distribution of the plurality of target sub-area space models in the area space model. And acquiring an image of the region corresponding to the target sub-region space model in the target object again through the unmanned aerial vehicle, so as to obtain a second video.

In addition, since there may be a situation that image data of a partial region is missing, an image is blurred, or details of the target object cannot be shown in an image corresponding to the partial region when the first video and/or the second video are captured, it is also possible to determine whether there is a region that needs to be subjected to additional capture in response to a user's view confirmation operation on the first video and/or the second video, and control the unmanned aerial vehicle to further perform additional capture according to the position of the region. In one possible case, the video obtained by the second complementary acquisition can be also used as the second video.

Here, the obtained second video may be a supplement to the first video, and therefore, in one possible case, the video of the target object obtained by image-capturing the target object includes the first video and the second video. In another possible case, if the complementary acquisition is not needed, the video obtained by acquiring the image of the target object includes the first video.

For the above S103, when obtaining attribute annotation data of a video based on the video, specifically, the step in the specific embodiment of receiving data annotation for a target object includes the following steps:

fifthly: and determining the annotation data.

When the user is prompted on the graphical user interface of the data processing device that the data processing device has proceeded to this step, for example, step E: and (6) data annotation.

In this step, specifically, when acquiring attribute annotation data of a video based on the video, for example, the following manner may be adopted: determining key frame images from the video; and generating attribute marking data of the video based on marking data obtained by attribute marking in response to attribute marking of the object to be marked in the key frame image.

Specifically, when determining a key frame image from a video, for example, but not limited to, any of the following two ways (a1) and (a2) may be employed:

(A1) the method comprises the following steps And determining a preset number of key frame images according to the number of video frame images contained in the video and the actual data annotation requirement.

For example, when a video includes 100 frames of video frame images and it is determined that annotation is performed on 10 frames of key frame images, in the case that annotation data in all 100 frames of video frame images in the video can be effectively and accurately determined, the preset number of key frame images may be determined to be 10 frames, and the 10 frames of key frame images are determined at the same frame number interval in the 100 frames of video frame images, for example, 10 frames of key frame images are determined, which are the 1 st frame, the 11 th frame, the 21 st frame, … …, the 81 th frame, and the 91 th frame.

Therefore, the key frame images can be determined from the multi-frame video frame images contained in the video more easily and conveniently, and the number of the key frame images can meet the requirement of subsequent actual data annotation.

(A2) The method comprises the following steps In response to a user selection of a video frame image in a video, a key frame image in the video is determined.

In a specific implementation, when a video is presented to a user, for example, in response to a user's selection operation on a part of video frames therein, the part of video frames selected by the user is used as a key frame image in the video.

Illustratively, when a video is presented to a user, a prompt for a selected key frame image may be displayed to the user, for example. Specifically, for example, a specific operation such as a long press, a double click, or the like by the user may be performed to select a video frame image in the video, and the selected video frame image may be used as the key frame image. In addition, prompt information can also be presented to the user, for example, the user is presented with a message containing the text "please press for long to select the frame of video frame image", and in the case that the user presses any frame of video frame image in the video for long, the frame of video frame image is taken as a key frame image.

In this way, the key frame images can be selected from the video in response to the relevant operation of the user more flexibly. In a possible situation, if there is a situation that a plurality of regions are partially concentrated for a target object, there may be no region to be labeled in consecutive multi-frame video frames in a video, but the region to be labeled is concentrated in other frame video frames, so that the situation that a video frame without a region to be labeled is taken as a key frame can be avoided by adopting a manner of manually selecting a video frame image, thereby improving the efficiency when subsequently using the key frame for data labeling. In another possible situation, if there is a situation that part of the video frames in the video frames are unclear or data is damaged, the situation that the video frames are used as key frames can also be avoided by adopting a mode of manually selecting the video frame images.

In the case of determining a key frame image in a video, for example, attribute annotation may be performed on an object to be annotated in the key frame image. Here, since the video acquired by image capturing the target object includes the key frame image capable of performing attribute labeling, the process of performing attribute labeling on the video may also be considered as a process of performing attribute labeling on the video when labeling the key frame image.

Specifically, when performing attribute annotation on an object to be annotated in a key frame image, for example, the following manners (B1) to (B3) may be adopted:

(B1) the method comprises the following steps Displaying the key frame image; and responding to a first labeling operation on the object to be labeled in the key frame image, and generating labeling data corresponding to the labeling operation.

When the key frame image is presented to the user, for example, the key frame image having the same size (in pixels) as the video may be presented to the user.

Specifically, when the key frame image is presented to the user, for example, the annotation control required when the first annotation operation is performed on the key frame image may also be provided to the user at the same time. Illustratively, referring to fig. 9, a schematic diagram of a graphical display interface when displaying a key frame image and labeling a control according to an embodiment of the present disclosure is provided.

For convenience of explanation, a specific process of actually filling in the annotation data by the user is taken as an example for explanation. Illustratively, the user can determine the position for data annotation on the graphical display interface through clicking operation. Referring to fig. 9, in the key frame image 91, the user can select a position 92 by a click operation, and correspondingly fill in the data annotation area 93 corresponding to the key frame image 92 with the relevant annotation data. The different areas can be regarded as devices for labeling in the target object, for example.

For the data labeling area 93, for example, part of the labeled data types that may be contained therein, such as device name, age, specific function, device person in charge, device manufacturer, device size specification, related text notes, etc., are shown in fig. 9. When data corresponding to different data types are filled, as shown in a reference area 93 in fig. 9, for example, the data may be filled directly by inputting characters, for example, inputting characters of a "watch hole" in a text input box under a device name, or inputting characters of a "survey scene window" in a text input box under a specific function. For another example, a plurality of selectable input options may be provided to the user, such as in response to a click operation of the user on the under-age selection box, a pull-down menu including a plurality of different ages is presented to the user, and selection items of "1 year", "2 years", "3 years" are included in the pull-down menu, so that the user can determine input data under the age through selection of the selection items.

In one possible case, a custom annotation segment, such as "custom annotation segment 1" shown in the data annotation area 93, may also be included in the data annotation area 93. In response to user editing of the custom annotation segment, a new annotation data type can be determined and a new input box generated. Thus, the flexibility in data annotation is higher. In a possible case, a slider 94 is further included in the data annotation region 93 for use by the user, because the content displayable on the graphical display interface is limited. In response to the user sliding the screen up and down in the graphical display interface, the slider bar 94 may also display an up-and-down sliding effect to prompt the user of the current progress of the data in the data annotation area. Therefore, by means of sliding the screen, the limitation of the size of the graphical display interface can be eliminated, and more space for writing the annotation data can be provided for the user.

In another possible case, the key frame image 91 may be used as a part of the data annotation region 93. Illustratively, for example, a selection box corresponding to "whether to use the key frame image as annotation data" may be displayed in the data annotation region 93, and the key frame image may be automatically used as annotation data in response to a user's selection operation on the selection box. Therefore, the original key frame image can be reserved in the annotation data, so that whether the annotation data has annotation errors or not can be checked back and corrected in time by calling the key frame image.

In this way, the data processing apparatus can determine the annotation data for the target object in response to the actual operation by the user.

(B2) The method comprises the following steps And performing semantic segmentation processing on the key frame image, and generating annotation data of the object to be annotated based on the result of the semantic segmentation processing.

When the data processing device performs semantic segmentation processing on the key frame image, the algorithm that can be adopted by the data processing device may include, for example, at least one of the following: convolutional Neural Networks (CNNs) and deep self-attention transform Networks (transformers).

After performing semantic segmentation processing on the key frame image, for example, a plurality of regions in the key frame image may be identified, and specific types of the plurality of regions may be determined. Specifically, for example, the description of the semantic segmentation processing in (B3) below may be referred to, and will not be described here.

After the specific types of the plurality of areas are determined, the relevant information of the area, such as at least one of the above-described device name, device service life, device specific function, device responsible person, device manufacturer, device size specification, and relevant text remark, may be retrieved from the corresponding database according to the specific types of the areas, and the determined relevant information is used as the labeling data of the object to be labeled.

Therefore, data annotation can be carried out on equipment and the like in the key frame image more efficiently in a semantic segmentation mode, manual intervention is not needed, and efficiency is higher.

(B3) The method comprises the following steps Generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image; responding to a second labeling operation on the object to be labeled in the preview image, and generating labeling data corresponding to the preview image; and obtaining the annotation data of the key frame image based on the annotation data corresponding to the preview image.

The size of the key frame image is relatively large, and the requirement of a user on the image definition is not high when the user performs data annotation, so that the size of the key frame image in the video can be reasonably reduced, for example, a preview image corresponding to the key frame image is generated, and the preview image is displayed to the user, so that the user can clearly identify each area in the preview image through a graphical display interface, the data transmission amount and the data processing amount after performing data annotation on the preview image are reduced, and the transmission efficiency and the data processing efficiency are further improved.

In a possible case, when performing the second annotation operation on the object to be annotated in the preview image, for example, a similar manner to the above (B1) can be adopted, and details are not repeated here. In another possible case, the second labeling operation may be performed by using the semantic segmentation process in combination with (B2) above. For example, referring to fig. 10, a schematic diagram of a graphical display interface for displaying a preview image provided by the embodiment of the present disclosure may assist a user to view semantic segmentation results respectively corresponding to different regions of a target object when displaying the preview image to the user, and the user may determine whether the classification is accurate.

Illustratively, a preview image 1001 corresponding to a key frame image may be shown in fig. 10, for example. After the key frame image is subjected to semantic segmentation processing in the manner described in (B2) above, a plurality of regions included in preview image 1001, including the tower, the lookout hole in the tower, the sightseeing stand, the equipment room, and the tower tip, can be specified.

After the semantic segmentation result is obtained, for example, in response to a user's selection operation on any one of the regions, the selected device name may be displayed in the identification mark region 1002 on the right side of the preview image 1001. Taking the example that the selected area includes the area 1003 (the area 1003 is shown in a dashed line for identification), since the selected area 1003 is determined to be of a specific type by semantic segmentation, the semantic segmentation result "sightseeing desk" corresponding to the selected area 1003 can be directly displayed in a text box below the "selected device name".

In this way, in a possible case, the user can check the current data annotation object again through the key frame image and the selected device name in the corresponding identification annotation area 1002, and the data processing device responds to the check operation of the user, so that the generation of data annotation errors can be reduced; in another possible case, if the semantic segmentation fails to obtain a correct recognition result, the method may further adjust in response to the modification of the selected device name by the user to ensure the correctness of the data annotation.

In addition, the identification tag area 1002 further includes other tag data types similar to those in the data tag area 93 in fig. 9, and a slider 1004 is correspondingly disposed, which can refer to the corresponding description in fig. 9, and will not be repeated herein.

After the key frame image is subjected to data annotation, the attribute annotation data of the video can be generated according to the annotation data obtained based on the attribute annotation. Specifically, for example, the following manner may be adopted: for any frame of video frame image in the video, in response to detecting that the frame of video frame image is a non-key frame image, determining a target key frame image corresponding to the frame of video frame image from the key frame images; and generating the annotation information of the frame of video frame image based on the annotation information of the target key frame image.

Specifically, for a plurality of frames of video frame images included in the video, the specific positions of the plurality of frames of key frames in the video can be determined. For convenience of description, the video includes 5 video frame images, and the 1 st frame and the 3 rd frame are key frame images. The video frame images of the non-key frame images in the video comprise 2 nd frame, 4 th frame and 5 th frame video frame images.

In a specific implementation, when determining a target key frame image for a video frame image other than a key frame image, for example, the following manner may be adopted: and determining the target key frame image for the frame video frame image from the key frame images based on the first position of the key frame image in the video and the second position of the video frame image in the video.

For example, the above-described video including 5 frame video frame images is taken as an example for description. The first position of the key frame image in the video may be represented by, for example, the frame number of the key frame image in the video, for example, in a 5-frame video frame image, the first position corresponding to the 1 st frame key frame image is the 1 st frame. Similarly, the second position of the video frame image of the non-key frame in the video can also be directly characterized by the frame number of the video frame image in the video, for example, in a 5-frame video frame image, the second position corresponding to the 2 nd frame video frame image is the 2 nd frame.

After determining the position of each frame of video frame image in the video, a target key frame image corresponding to the video frame image of the non-key frame image may be determined. Specifically, the key frame image of the latest frame positioned before the video frame image of the frame may be used as the target key frame image corresponding to the video frame image of the frame.

For example, when determining a target key frame image for a2 nd frame video frame image, the target key frame image may be determined in a key frame image determined before the 2 nd frame video frame image. Here, the 2 nd frame video frame image only includes one key frame image before, that is, the 1 st frame key frame image, and accordingly the 1 st frame key frame image is used as the target key frame image corresponding to the 2 nd frame video frame image. When determining the target key frame image for the 5 th frame video frame image, the target key frame image may be determined among key frame images determined before the 5 th frame video frame image. Here, the 5 th frame of video frame image includes two key frame images before, wherein the key frame image adjacent to the 5 th frame of video frame image is the 3 rd frame of key frame image, and then the 3 rd frame of key frame image is taken as the target key frame image corresponding to the 5 th frame of video frame image.

Here, in a possible case, since the key frame image in the video is determined, the video frame image of the region not currently shown in the video is first shown as the key frame image. For the video frame image of the non-key frame image in the video, it can be considered that the region included in the key frame image of the frame closest to the video frame image is closest to the region shown in the video frame image, so that when labeling other key frame images in the video, the method of using the key frame image of the frame closest to the video frame image of the non-key frame as the target key frame image corresponding to the key frame image can perform synchronous labeling of data on the video frame image of the non-key frame image more accurately in the follow-up process.

Specifically, when generating the annotation information of the target key frame image based on the annotation information of the target key frame image, for example, a key point tracking method may be adopted to synchronize the data annotated in the target key frame image into the other frame video frame images.

In this way, since the data annotation of the key frame image is completed, the data can be synchronized correspondingly in other video frame images than the key frame image according to the key point tracking mode. In a possible case, since there may be a case that a more detailed part needs to be labeled when labeling a region, in the preceding labeling step, for example, a part of the region may be labeled, and the same region in other video frame images may be labeled in a way of keypoint tracking, so that the more detailed labeling may be completed when labeling data.

After the synchronization of the labeling data of the video frame images of all the non-key frame images in the video is completed, the video with the completed data labeling can be obtained.

In another embodiment of the present disclosure, a specific embodiment of data annotation is also provided. Referring to fig. 11, a flowchart of a specific embodiment of the present disclosure is provided when performing data annotation, where:

s1101: and opening a data labeling environment.

Specifically, the data annotation method provided by the embodiment of the present disclosure may be applied to an Application (APP). After the APP is opened, the data processing device may correspondingly open a data annotation environment, for example, an entry for providing data acquisition and annotation; after the portal is started, a Software Development Kit (SDK) is called.

S1102: and acquiring data of the site, and determining an acquisition task list corresponding to the site.

Wherein the site is, for example, a site where many devices are installed; the station may include, but is not limited to, at least one machine room, and at least one of indoor control cabinet equipment deployed in the machine room, a tower installed on the ceiling of the machine room, and an outdoor control cabinet.

After invoking the SDK, the data processing device may obtain the site data accordingly. Specifically, corresponding to different sites, an Identifier (ID) corresponding to a site may be used as a unique identification identifier, a site for data annotation is determined in response to an input identifier, a two-dimensional code of the site is scanned to obtain the identifier, and data required by an asset platform and a generation platform corresponding to the site is transmitted to create a current collection task list. When data annotation is performed on the site, the data to be annotated on the site can be determined according to the relevant tasks in the collection task list.

S1103: and determining the latest data labeling attribute packet through a network request attribute platform.

Specifically, after the current collection task list is created, the attribute platform may be requested through the network to determine whether the data annotation attribute package is updated. When there is an updated data tagging attribute package, for example, the latest data tagging attribute package may be downloaded in an APP update manner, so as to call data such as tagging attributes in the data tagging attribute package in a specific process of data tagging in the following. In the case that there is no updated data tagging attribute packet, S1104 may be continuously executed, that is, the collection process is normally performed.

S1104: and controlling the image acquisition equipment to acquire images of the target space to obtain a video.

For a specific description of this step, reference may be made to the above description of S111, and details are not repeated here.

S1105: and determining the marking data of the video frame image contained in the video by using the data marking attribute packet to obtain the attribute marking data of the video.

For a specific description of this step, reference may be made to the above description of S102 and S103, and details are not repeated here again.

Here, in steps S1104 and S1105, when performing image acquisition and data annotation, the image acquisition and/or the attribute annotation can be performed by using the data annotation attribute package in an offline state without depending on network connection, for example, by using an image acquisition device.

S1106: confirming whether the attribute marking data is correct or not; if yes, go to step S1107; if not, go to step S1108.

S1107: and uploading attribute marking data.

When uploading the attribute marking data, for example, the network connection may be waited, and the attribute marking data may be uploaded sequentially after the network connection is successful. The specific manner of uploading the attribute labeling data can be referred to the following description of S104, and will not be described in detail here.

S1108: determining whether an image acquisition error exists; if yes, return to step S1104; if not, return to step S1105.

Thus, the related task of determining the annotation data corresponding to the fifth step can be completed.

For the above S104, in another embodiment of the present disclosure, second pose information when the image capturing device captures the video may also be obtained; in addition, when generating the target capture data based on the attribute labeling data and the video, for example, the following manner may be adopted: and generating the target acquisition data based on the attribute labeling data, the video and second attitude information when the image acquisition equipment acquires the video.

Specifically, the fifth step in the specific embodiment of receiving the data annotation for the target object further includes the following steps:

sixthly, the method comprises the following steps: and uploading the data.

When the user is prompted on the graphical user interface of the data processing device that the data processing device proceeds to the step sixthly, for convenience of distinguishing, for example, the following steps may be displayed: and (6) uploading the data.

In the step sixthly, specifically, for example, the attribute labeling data, the video and the second pose information may be generated into corresponding target acquisition data under different timestamps according to at least one of the timestamps; or, the attribute labeling data, the video and the second pose information can be directly subjected to data packing according to at least one of the timestamps to obtain a collection data packet which needs to be submitted finally, and the collection data packet is used as target collection data.

In addition, in a possible case, since the number of video frames included in a video may be large, and the size of a video frame image in the video is large, the amount of data to be transmitted when data transmission is performed is also large, and therefore, if a complete video is uploaded, it takes a long time, which causes a problem of low transmission efficiency. Therefore, the frame extraction processing can be carried out on the acquired original video, and the video data with the extracted partial frames is used as partial data in the target acquisition data, so that the data volume required to be transmitted is effectively reduced, and the data transmission efficiency is improved.

In another embodiment of the present disclosure, a specific embodiment when data annotation is performed on a tower is further provided. In this embodiment, a data processing device is used to select a mobile phone, and an image acquisition device is used to select an unmanned aerial vehicle for explanation; in addition, in the specific embodiment, in order to distinguish specific operation steps of the data processing device and the image acquisition device when actually performing data annotation processing, specific processes of the unmanned aerial vehicle and the two sides of the mobile phone in the data annotation process are described in detail; moreover, the data annotation method provided by the embodiment of the present disclosure is applied to the embodiment, and can be divided into three stages, including stage I: data acquisition phase, phase II: data annotation phase, and phase III: and (5) a data uploading stage. Referring to fig. 12, a flowchart corresponding to an embodiment of labeling data in a machine room according to an embodiment of the present disclosure is shown, wherein,

stage I: a data acquisition stage, comprising the following steps S1201-S1222; wherein the content of the first and second substances,

s1201: and the mobile phone starts a data labeling environment.

S1202: the mobile phone sends a connection request to the unmanned aerial vehicle.

S1203: and the unmanned aerial vehicle receives the connection request and is connected with the mobile phone.

S1204: the mobile phone confirms that the connection with the unmanned aerial vehicle is completed.

S1205: the handset determines the type of tower.

S1206: and adjusting the pose of the unmanned aerial vehicle.

S1207: the mobile phone confirms that the unmanned aerial vehicle reaches the preset position.

S1208: and starting collection by the mobile phone.

In step S1208, the method includes a step of determining to turn on data acquisition by the mobile phone, and a step of sending a control command to the unmanned aerial vehicle to prepare for starting image acquisition.

S1209: the unmanned aerial vehicle confirms the start of the acquisition.

S1210: the unmanned aerial vehicle continuously generates the collected data.

In this step, for example, a.

S1211: and (4) wireless connection transmission.

In this step, for example, the preview video stream acquired in real-time a can be synchronized to the handset.

S1212: the unmanned aerial vehicle automatically avoids obstacles.

In this step, the unmanned aerial vehicle may determine an obstacle avoidance strategy according to, for example, a. the preview video stream acquired in real time and b. the video frame image acquired in real time, so as to ensure that the unmanned aerial vehicle avoids colliding with the tower and ensures safe flight.

S1213: and calculating RTK in real time.

S1214: and calculating the region space model in real time.

S1215: and judging the pose of the unmanned aerial vehicle in real time.

S1216: and the graphical display interface displays the real-time preview result.

S1217: and finishing the collection.

In this step, for example, g. a preview video stream whose acquisition is completed after synchronization and h. a set of preview acquisition video streams after synchronization may be generated.

Further, after steps S1212 and S1217, the method further includes:

s1218: the acquisition is stopped.

In step, c. real-time captured video frame images and d. capture completed video frame image sets noted in fig. 12 may be generated, for example.

After step S1217, the method further includes:

s1219: and receiving a manual confirmation operation instruction for acquiring the coverage area.

S1220: determining whether additional mining is needed; if yes, go to step S1221; if not, go to step S1222.

S1221: and (5) starting supplementary mining.

S1222: and finishing the collection.

The data obtained in this step may also be used as part of the set of h.

After step S1221, the method further includes:

s1223: and controlling the unmanned aerial vehicle to collect data.

After this step is completed, step S1216 may be executed accordingly, and the result may be obtained as a part of the set of c.

After step S1218, the method further includes:

s1224: and (4) wireless connection transmission.

In the step, the video frame image acquired in real time in the step c can be transmitted to the mobile phone, so that the video frame image after e-synchronization is obtained.

S1225: and (4) wireless connection transmission.

In this step, the video frame image set after d. acquisition can be transmitted to the mobile phone, so as to obtain the video frame image set after f. synchronization.

Stage II: a data labeling stage including the following steps S1226 to S1222; wherein the content of the first and second substances,

s1226: and displaying the preview video stream set to a worker.

In this step, the worker may view the preview video set on a graphical user interface.

S1227: and determining the key frame image for data annotation.

S1228: and marking the area subjected to data annotation.

S1229: and determining all areas of the tower to finish data annotation.

In this step, annotation data matching the i. with the video may also be generated.

S1230: the file name is modified according to the site and/or time.

The file described here may be, for example, a file including storage label data, a site, a video, and the like.

S1231: and selecting the files to be uploaded.

S1232: and (5) data compression.

In this step, j. compressed video, video frame image, preview video frame image, and k. tag data corresponding to the video frame image can be generated.

S1233: and packaging all the files.

S1234: and determining the uploading progress.

S1235: and determining that the uploading is finished.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, a data labeling device corresponding to the data labeling method is also provided in the embodiments of the present disclosure, and as the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the data labeling method in the embodiments of the present disclosure, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 13, a schematic diagram of a data annotation device provided in an embodiment of the present disclosure is shown, where the device includes: a generating module 131, a control module 132, an obtaining module 133 and a first processing module 134; wherein the content of the first and second substances,

a generating module 131, configured to generate a region space model corresponding to the target object; wherein the region space model comprises: the positions of the plurality of sub-region space models and the positions of the plurality of sub-region space models respectively correspond to each other; the control module 132 is configured to control an image acquisition device to perform image acquisition on the target object based on the area space model, so as to obtain a video of the target object; an obtaining module 133, configured to obtain attribute tagging data of the video based on the video; the first processing module 134 is configured to generate target collection data based on the attribute labeling data and the video.

In an alternative embodiment, the generating module 131, when generating the region space model corresponding to the target object, is configured to: detecting that the image acquisition equipment reaches a preset position relative to the target object, and acquiring first position and attitude information of the image acquisition equipment; generating the region space model based on first pose information of the image acquisition device.

In an optional embodiment, the data annotation apparatus further includes a second processing module 135, configured to: controlling the image acquisition equipment to move above the target object to acquire an overhead view image of the target object; the target object is located in a preset area in the overhead view image, and the fact that the image acquisition equipment reaches a preset position relative to the target object is determined.

In an alternative embodiment, the generating module 131, when generating the region space model based on the first pose information of the image capturing device, is configured to: determining a bounding box corresponding to the target object in the overhead view image; determining a projection position of the region space model in the overhead image based on the position of the bounding box in the overhead image; determining poses of the plurality of sub-region space models of the region space model respectively corresponding to the projection position of the region space model in the overhead view image, first pose information of the image acquisition equipment when the overhead view image is acquired, and parameter information of the target object; and determining the region space model based on the poses respectively corresponding to the plurality of sub-region space models.

In an optional embodiment, the parameter information of the target object includes a height of the target object; the generating module 131, when determining poses corresponding to a plurality of sub-region space models of the region space model respectively based on a projection position of the region space model in the overhead view image, first pose information of the image capturing apparatus when acquiring the overhead view image, and parameter information of the target object, is configured to: determining an annular cylindrical model surrounding the target object based on the projection position of the region space model in the overhead view image, first attitude information of the image acquisition equipment when the overhead view image is acquired, and the height of the target object; dividing the annular cylindrical surface model based on the size of a preset subregion space model to obtain a plurality of subregion space models; and determining the position of each sub-region space model in the annular cylindrical surface model and the position and posture of the annular cylindrical surface model according to the position of each sub-region space model in the annular cylindrical surface model and the position and posture of the annular cylindrical surface model.

In an alternative embodiment, the control module 132, when controlling the image capturing device to capture the image of the target object based on the region space model and based on the region space model, and obtaining the video of the target object, is configured to: and controlling the image acquisition equipment to move around the target object based on the poses respectively corresponding to the plurality of sub-area space models in the area space model, and controlling the image acquisition equipment to acquire a first video of the target object in the process of reaching image shooting areas respectively corresponding to the plurality of sub-area space models.

In an optional embodiment, the control module 132, when controlling the image capturing device to capture the image of the target object based on the area space model to obtain a video of the target object, is configured to: detecting whether a position range corresponding to a target subregion space model which is not reached exists or not based on poses respectively corresponding to a plurality of subregion space models in the region space model and second pose information of the image acquisition equipment when the first video is acquired; and in response to the fact that the position range corresponding to the target sub-region space model which is not reached exists, controlling the image acquisition equipment to move to the position range corresponding to the target sub-region space model based on the pose of the target sub-region space model, and acquiring a second video.

In an optional embodiment, when acquiring the attribute annotation data of the video based on the video, the acquiring module 133 is configured to: determining key frame images from the video; and generating attribute marking data of the video based on marking data obtained by attribute marking in response to attribute marking of the object to be marked in the key frame image.

In an optional embodiment, when acquiring the attribute annotation data of the video based on the video, the acquiring module 133 is configured to: displaying key frame images in the video; and responding to a first labeling operation on the object to be labeled in the key frame image, and generating labeling data corresponding to the first labeling operation.

In an optional implementation manner, when performing attribute annotation on an object to be annotated in the key frame image, the obtaining module 133 is configured to: and performing semantic segmentation processing on the key frame image, and generating annotation data of the object to be annotated based on the result of the semantic segmentation processing.

In an optional implementation manner, when performing attribute annotation on an object to be annotated in the key frame image, the obtaining module 133 is configured to: generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image; responding to a second labeling operation on the object to be labeled in the preview image, and generating labeling data corresponding to the preview image; and obtaining the annotation data of the key frame image based on the annotation data corresponding to the preview image.

In an optional embodiment, when generating the target capture data based on the attribute annotation data and the video, the obtaining module 133 is configured to: for any frame of video frame image in the video, in response to detecting that the frame of video frame image is a non-key frame image, determining a target key frame image corresponding to the frame of video frame image from the key frame images; and generating the annotation information of the frame of video frame image based on the annotation information of the target key frame image.

In an optional embodiment, when determining the target key frame image for the frame of video frame image from the key frame images, the obtaining module 133 is configured to: and determining the target key frame image for the frame video frame image from the key frame images based on the first position of the key frame image in the video and the second position of the frame video frame image in the video.

In an optional implementation manner, the data annotation apparatus further includes a third processing module 136 configured to: acquiring second position and posture information when the image acquisition equipment acquires the video; the first processing module 134, when generating the target capture data based on the attribute annotation data and the video, is configured to: and generating the target acquisition data based on the attribute labeling data, the video and second attitude information when the image acquisition equipment acquires the video.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 14, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and includes:

a processor 10 and a memory 20; the memory 20 stores machine-readable instructions executable by the processor 10, the processor 10 being configured to execute the machine-readable instructions stored in the memory 20, the processor 10 performing the following steps when the machine-readable instructions are executed by the processor 10:

generating a region space model corresponding to the target object; wherein the region space model comprises: the positions of the plurality of sub-region space models and the positions of the plurality of sub-region space models respectively correspond to each other; controlling an image acquisition device to acquire an image of the target object based on the region space model to obtain a video of the target object; acquiring attribute labeling data of the video based on the video; and generating target acquisition data based on the attribute labeling data and the video.

The storage 20 includes a memory 210 and an external storage 220; the memory 210 is also referred to as an internal memory, and temporarily stores operation data in the processor 10 and data exchanged with the external memory 220 such as a hard disk, and the processor 10 exchanges data with the external memory 220 through the memory 210.

The specific execution process of the instructions and other executable instructions executed by the processor may refer to the steps of the data labeling method in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the data annotation method in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the data labeling method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

The disclosure relates to the field of augmented reality, and aims to detect or identify relevant features, states and attributes of a target object by means of various visual correlation algorithms by acquiring image information of the target object in a real environment, so as to obtain an AR effect combining virtual and reality matched with specific applications. For example, the target object may relate to a face, a limb, a gesture, an action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, a display area, a display item, etc. associated with a venue or a place. The vision-related algorithms may involve visual localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and the like. The specific application can not only relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also relate to special effect treatment related to people, such as interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through the convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for annotating data, comprising:

generating a region space model corresponding to the target object; wherein the region space model comprises: the positions of the plurality of sub-region space models and the positions of the plurality of sub-region space models respectively correspond to each other;

controlling an image acquisition device to acquire an image of the target object based on the region space model to obtain a video of the target object;

acquiring attribute labeling data of the video based on the video;

and generating target acquisition data based on the attribute labeling data and the video.

2. The data annotation method of claim 1, wherein the generating a region space model corresponding to the target object comprises:

detecting that the image acquisition equipment reaches a preset position relative to the target object, and acquiring first position and attitude information of the image acquisition equipment;

generating the region space model based on first pose information of the image acquisition device.

3. The data annotation method of claim 2, wherein said detecting that the image capture device reaches a predetermined position relative to the target object comprises:

controlling the image acquisition equipment to move above the target object to acquire an overhead view image of the target object;

the target object is located in a preset area in the overhead view image, and the fact that the image acquisition equipment reaches a preset position relative to the target object is determined.

4. The data annotation method of claim 3, wherein the generating the region space model based on the first pose information of the image capture device comprises:

determining a bounding box corresponding to the target object in the overhead view image;

determining a projection position of the region space model in the overhead image based on the position of the bounding box in the overhead image;

determining poses of the plurality of sub-region space models of the region space model respectively corresponding to the projection position of the region space model in the overhead view image, first pose information of the image acquisition equipment when the overhead view image is acquired, and parameter information of the target object;

and determining the region space model based on the poses respectively corresponding to the plurality of sub-region space models.

5. The data annotation method of claim 4, wherein the parameter information of the target object includes a height of the target object;

the determining, based on the projection position of the region space model in the overhead view image, the first pose information of the image acquisition device when acquiring the overhead view image, and the parameter information of the target object, poses corresponding to a plurality of sub-region space models of the region space model respectively includes:

determining an annular cylindrical model surrounding the target object based on the projection position of the region space model in the overhead view image, first attitude information of the image acquisition equipment when the overhead view image is acquired, and the height of the target object;

dividing the annular cylindrical surface model based on the size of a preset subregion space model to obtain a plurality of subregion space models;

and determining the position of each sub-region space model in the annular cylindrical surface model and the position and posture of the annular cylindrical surface model according to the position of each sub-region space model in the annular cylindrical surface model and the position and posture of the annular cylindrical surface model.

6. The data annotation method according to any one of claims 1 to 5, wherein the controlling an image capturing device to capture an image of the target object based on the regional space model to obtain a video of the target object comprises:

and controlling the image acquisition equipment to move around the target object based on the poses respectively corresponding to the plurality of sub-area space models in the area space model, and controlling the image acquisition equipment to acquire a first video of the target object in the process of reaching image shooting areas respectively corresponding to the plurality of sub-area space models.

7. The data annotation method of claim 6, wherein the controlling an image capture device to capture an image of the target object based on the regional space model to obtain a video of the target object comprises:

detecting whether a position range corresponding to a target subregion space model which is not reached exists or not based on poses respectively corresponding to a plurality of subregion space models in the region space model and second pose information of the image acquisition equipment when the first video is acquired;

and in response to the fact that the position range corresponding to the target sub-region space model which is not reached exists, controlling the image acquisition equipment to move to the position range corresponding to the target sub-region space model based on the pose of the target sub-region space model, and acquiring a second video.

8. The data annotation method according to any one of claims 1 to 7, wherein the obtaining attribute annotation data for the video based on the video comprises:

determining key frame images from the video;

and generating attribute marking data of the video based on marking data obtained by attribute marking in response to attribute marking of the object to be marked in the key frame image.

9. The data annotation method according to any one of claims 1 to 8, wherein the obtaining attribute annotation data for the video based on the video comprises:

displaying key frame images in the video;

and responding to a first labeling operation on the object to be labeled in the key frame image, and generating labeling data corresponding to the first labeling operation.

10. The data annotation method of claim 8, wherein the attribute annotation of the object to be annotated in the key frame image comprises:

and performing semantic segmentation processing on the key frame image, and generating annotation data of the object to be annotated based on the result of the semantic segmentation processing.

11. The data annotation method of claim 8, wherein the attribute annotation of the object to be annotated in the key frame image comprises:

generating a preview image based on the key frame image, and displaying the preview image; wherein the preview image has a resolution lower than the resolution of the key frame image;

responding to a second labeling operation on the object to be labeled in the preview image, and generating labeling data corresponding to the preview image;

and obtaining the annotation data of the key frame image based on the annotation data corresponding to the preview image.

12. The data annotation method of any one of claims 8 to 11, wherein generating target acquisition data based on the attribute annotation data and the video comprises:

for any frame of video frame image in the video, in response to detecting that the frame of video frame image is a non-key frame image, determining a target key frame image corresponding to the frame of video frame image from the key frame images;

and generating the annotation information of the frame of video frame image based on the annotation information of the target key frame image.

13. The method for annotating data according to claim 12, wherein said determining a target key frame image corresponding to the frame of video frame image from said key frame images comprises:

and determining a target key frame image corresponding to the frame of video frame image from the key frame images based on the first position of the key frame image in the video and the second position of the video frame image in the video.

14. The data annotation method of any one of claims 1-13, wherein the method further comprises:

acquiring second position and posture information when the image acquisition equipment acquires the video;

generating target collection data based on the attribute labeling data and the video, comprising:

and generating the target acquisition data based on the attribute labeling data, the video and second attitude information when the image acquisition equipment acquires the video.

15. A data annotation device, comprising:

the generating module is used for generating a region space model corresponding to the target object; wherein the region space model comprises: when image acquisition equipment acquires images of a target object, a plurality of sub-area space models and a plurality of poses corresponding to the sub-area space models are required to be reached;

the control module is used for controlling the image acquisition equipment to acquire images of the target object based on the region space model so as to obtain a video of the target object;

the acquisition module is used for acquiring attribute labeling data of the video based on the video;

and the first processing module is used for generating target acquisition data based on the attribute labeling data and the video.

16. A computer device, comprising: a processor, a memory storing machine readable instructions executable by the processor, the processor for executing the machine readable instructions stored in the memory, the processor performing the steps of the data annotation method of any one of claims 1 to 14 when the machine readable instructions are executed by the processor.

17. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the data annotation method according to any one of claims 1 to 14.