CN118247446A

CN118247446A - Method and device for reconstructing single sign, electronic equipment and storage medium

Info

Publication number: CN118247446A
Application number: CN202211612625.4A
Authority: CN
Inventors: 伍广明; 刘驰; 谭效良
Original assignee: Fengtu Technology Shenzhen Co Ltd
Current assignee: Fengtu Technology Shenzhen Co Ltd
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2024-06-25

Abstract

The application provides a method, a device, electronic equipment and a storage medium for reconstructing a single sign, relates to the technical field of three-dimensional reconstruction, and solves the problem that a real-scene three-dimensional model produced by an unmanned aerial vehicle cannot effectively reconstruct a real-scene of a road and a roadside; recognizing the card face characters on the target indication board in the indication board video to obtain a character recognition result; constructing a three-dimensional frame of the target indication board according to the three-dimensional point cloud information of the target indication board; and fusing the pre-constructed three-dimensional indication board template, the three-dimensional indication board three-dimensional frame and the character recognition result to generate a three-dimensional single indication board model. The application can identify complete and clear-character board characters on the target indication board, and fuse character identification results, the three-dimensional indication board templates and the three-dimensional indication board three-dimensional frame to generate the three-dimensional single indication board model which has the same orientation as the target indication board in the real scene, and has better three-dimensional reconstruction effect and lower reconstruction cost.

Description

Method and device for reconstructing single sign, electronic equipment and storage medium

Technical Field

The application relates to the technical field of three-dimensional reconstruction, in particular to a method and a device for reconstructing a single sign, electronic equipment and a storage medium.

Background

The three-dimensional relationship of the live-action is in many civil fields such as intelligent transportation, intelligent house management, pipeline planning, environmental monitoring and the like, and is greatly beneficial to improving the living standard of people. The method is used for promoting the large-scale live-action three-dimensional reconstruction, and not only is the reconstruction of a large scene of a overlooking view field of a district, a street, a city district and the like required, but also the fine three-dimensional reconstruction of roads and road signs on two sides of the roads is required.

The existing technology generally adopts an unmanned aerial vehicle shooting mode to acquire a live-action image, and then carries out three-dimensional reconstruction according to the live-action image shot by the unmanned aerial vehicle, and because the live-action image is limited by the resolution and the limited visual angle of the current unmanned aerial vehicle, the imaging distance is too far, the visual angle is not full, the texture and the text of the indication board are seriously missing, so that clear pictures of a plurality of visual angles of the road indication board cannot be completely acquired, and a three-dimensional single indication board model with complete and clear text cannot be generated, therefore, the live-action three-dimensional model produced by the unmanned aerial vehicle cannot effectively reconstruct a live-action three-dimensional scene of a road and a roadside.

Disclosure of Invention

The application provides a method, a device, electronic equipment and a storage medium for reconstructing a single sign, which can solve the problem that a real-scene three-dimensional model produced based on an unmanned aerial vehicle can not effectively reconstruct real-scene three-dimensional scenes of roads and roadsides at present, and can generate a three-dimensional sign model with accurate geographic positions and sign indication contents, and are suitable for reconstructing large-scale three-dimensional real-scene.

In one aspect, the application provides a method for reconstructing a single body of a sign, comprising the following steps:

Acquiring indication board videos obtained by shooting the same target indication board from a plurality of different visual angles by video acquisition equipment;

recognizing a face character on the target indication board in the indication board video to obtain a character recognition result;

according to the indication board video, carrying out three-dimensional point cloud reconstruction on the target indication board to obtain three-dimensional point cloud information of the target indication board;

constructing a three-dimensional indication board stereoscopic frame of the target indication board according to the three-dimensional point cloud information of the target indication board;

And fusing the pre-constructed three-dimensional indication board template, the three-dimensional indication board three-dimensional frame and the character recognition result to generate a three-dimensional single indication board model of the target indication board, wherein the orientation and the board surface character of the three-dimensional single indication board model are the same as those of the target indication board.

In one possible implementation manner of the present application, the sign video includes an image sequence composed of a plurality of two-dimensional sign images, and the identifying the face characters on the target sign in the sign video to obtain a character identification result includes:

Detecting the target indication board from the two-dimensional indication board image to generate an indication board detection frame;

intercepting a target indication board image of a corresponding area of the indication board detection frame;

And recognizing the card face characters in the target indication card image to obtain the character recognition result.

In one possible implementation manner of the present application, the reconstructing the three-dimensional point cloud of the target sign according to the video of the target sign to obtain the three-dimensional point cloud information of the target sign includes:

and carrying out sparse point cloud reconstruction on the target indication board according to an image sequence formed by the multi-frame two-dimensional indication board images to obtain pose parameters when the video acquisition equipment images.

In one possible implementation manner of the present application, the reconstructing the three-dimensional point cloud of the target sign according to the video of the target sign to obtain the three-dimensional point cloud information of the target sign further includes:

Carrying out dense point cloud reconstruction on the target indication board according to an image sequence formed by the multi-frame two-dimensional indication board images and the pose parameters to obtain a three-dimensional dense point cloud of the target indication board;

and taking the three-dimensional dense point cloud of the target indication board as the three-dimensional point cloud information of the target indication board.

In one possible implementation manner of the present application, the constructing a three-dimensional indication board stereoscopic frame of the target indication board according to the three-dimensional point cloud information of the target indication board includes:

According to the indication board detection frame, the pose parameters and the three-dimensional point cloud information, filtering three-dimensional point clouds outside the indication board detection frame in the three-dimensional point cloud information to obtain filtered three-dimensional point cloud information;

And determining a three-dimensional indication board framework of the target indication board according to the framework formed by the filtered three-dimensional point cloud information.

In one possible implementation manner of the present application, before the fusing the pre-constructed three-dimensional sign template, the three-dimensional sign stereoscopic frame and the character recognition result to generate the three-dimensional single sign model of the target sign, the method includes:

determining the longest side of the indication board of the three-dimensional indication board stereoscopic frame in the xy two-dimensional horizontal plane;

Determining the direction of the three-dimensional indication board stereoscopic frame in the xyz three-dimensional space coordinate system according to the azimuth angle of the longest side of the indication board in the xyz three-dimensional space coordinate system;

The xy two-dimensional horizontal plane is a two-dimensional plane formed by the xyz three-dimensional space coordinate system in the x axis and the y axis.

Determining the height of the indication board of the three-dimensional indication board stereoscopic frame according to the difference value of the maximum z-axis value and the minimum z-axis value of the three-dimensional indication board stereoscopic frame along the z-axis direction;

the z-axis direction is the direction in which the z-axis extends in the xyz three-dimensional space coordinate system.

In one possible implementation manner of the present application, the fusing the pre-constructed three-dimensional sign template, the three-dimensional sign stereoscopic frame and the character recognition result to generate the three-dimensional single sign model of the target sign includes:

Scaling and rotating the three-dimensional indication board template according to the indication board orientation and the indication board height of the three-dimensional indication board three-dimensional frame, and fusing the three-dimensional indication board template into the three-dimensional indication board three-dimensional frame to obtain an initial three-dimensional single indication board model;

and fusing the card face characters in the character recognition result to the indication board face of the initial three-dimensional single indication board model to finally obtain the three-dimensional single indication board model.

In another aspect, the present application provides a sign monomer reconstruction apparatus, the apparatus comprising:

The video acquisition module is used for acquiring indication board videos obtained by shooting the same target indication board from a plurality of different view angles by the video acquisition equipment;

The character recognition module is used for recognizing the face characters on the target indication board in the indication board video to obtain a character recognition result;

The point cloud generation module is used for reconstructing the three-dimensional point cloud of the target indication board according to the indication board video to obtain the three-dimensional point cloud information of the target indication board;

The frame construction module is used for constructing a three-dimensional indication board stereoscopic frame of the target indication board according to the three-dimensional point cloud information of the target indication board;

The model generation module is used for fusing a pre-constructed three-dimensional indication board template, the three-dimensional indication board three-dimensional frame and the character recognition result to generate a three-dimensional single indication board model of the target indication board, and the orientation and the board surface characters of the three-dimensional single indication board model are the same as those of the target indication board.

In another aspect, the present application also provides a computer readable storage medium having stored thereon a computer program, the computer program being loaded by a processor to perform the steps of the method for reconstructing a sign monomer.

According to the method for reconstructing the single body of the indication board, the image of the target indication board can be obtained from multiple view angles, the complete and clear-character board surface character on the target indication board can be identified, the character recognition result is obtained, the three-dimensional indication board three-dimensional frame with the position and the orientation is conveniently constructed according to the three-dimensional point cloud information of the target indication board, finally, the character recognition result, the three-dimensional indication board template and the three-dimensional indication board three-dimensional frame are fused, the three-dimensional single body indication board model with the same orientation as the target indication board in a real scene is generated, the three-dimensional reconstruction effect is good, and the reconstruction cost is low.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of a scene of a monomer reconstruction system for a sign provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of an embodiment of a method for reconstructing a monomer of a sign provided in an embodiment of the present application;

FIG. 3 is a schematic structural view of one embodiment of a three-dimensional monomer sign model provided in an embodiment of the present application;

FIG. 4 is a schematic structural view of an embodiment of a single body reconstruction device for a sign provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of an embodiment of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present application, the term "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" in this disclosure is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been described in detail so as not to obscure the description of the application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

In order to facilitate understanding, some technical terms related to the embodiments of the present application are briefly described below.

1. Three-dimensional model: a three-dimensional model is a polygonal representation of an object, typically displayed with a computer or other video device. The displayed object may be a real world entity or an imaginary object. Anything that exists in physical nature can be represented by a three-dimensional model. In the embodiment of the application, the three-dimensional model of the object is used for indicating the three-dimensional structure and the size information of the object. There are various data storage forms of the three-dimensional model, for example, the three-dimensional model is represented in the form of a three-dimensional point cloud, a grid or a voxel, and the data storage forms are not limited herein.

2. Camera external parameters: i.e. the external parameters of the camera, are the conversion relations between the world coordinate system and the camera coordinate system, including rotation parameters and translation parameters.

2.1 World Coordinate System (World Coordinates)

The world coordinate system (x _w,y_w,z_w), also called a measurement coordinate system, is a three-dimensional rectangular coordinate system, and based on the three-dimensional rectangular coordinate system, the spatial positions of the camera and the object to be measured can be described, and the position of the world coordinate system can be freely determined according to actual conditions.

2.2 Camera coordinate System (Camera Coordinate)

The camera coordinate system (x _c,y_c,z_c) is also a three-dimensional rectangular coordinate system, the origin is positioned at the optical center of the lens, the x and y axes are respectively parallel to the two sides of the phase plane, and the z axis is the optical axis of the lens and is perpendicular to the image plane.

Conversion of world coordinate system into camera coordinate system

2.3 Conversion of world coordinate System into Camera coordinate System

Wherein,Namely an external reference matrix of the camera, R is a 3*3 rotation matrix, which is the product of the rotation matrix of each coordinate axis, wherein the rotation parameter/>, of each coordinate axisT is the translation parameter of 3*1 (t _x,t_y,t_z).

According to camera external parameters, the camera pose, namely the position of the camera in space and the pose of the camera, can be determined, and can be respectively regarded as translation transformation and rotation transformation of the camera from an original reference position to a current position. Similarly, the pose of the target object in the present application is the position of the target object in space and the pose of the target object.

3. Camera internal parameters: that is, the internal parameters of the camera are the conversion relation between the camera coordinate system and the pixel coordinate system, that is, the conversion relation is used for converting the length unit into the pixel coordinate with the pixel as the unit, and after the camera leaves the factory, the internal parameters of the camera are fixed. Illustratively, the internal parameters of the camera include an internal parameter matrix of the camera, specifically:

The internal parameters of the camera are respectively as follows: f is focal length in millimeters; f _x is the length of the focal length in the x-axis direction using pixels; f _y is the length of the focal length in the y-axis direction using pixels; u ₀ and v ₀ are principal point coordinates (relative to the imaging plane), in pixels; gamma is a coordinate axis tilt parameter, ideally 0.

4. Calibrating a camera: in image measurement processes and machine vision applications, in order to determine the correlation between the three-dimensional geometric position of a point on the surface of a spatial object and its corresponding point in the image, geometric models of camera imaging must be established, these geometric model parameters are camera parameters, which include camera internal parameters, camera external parameters and distortion parameters of the camera, these parameters must be obtained through experiments and calculations under most conditions, and this process of solving parameters is called camera calibration (or camera calibration), and the current method of camera calibration includes: a linear calibration method, a nonlinear optimization calibration method and a two-step calibration method.

5. Three-dimensional template: the method for fusing and reconstructing the indication board and the live-action provided by the application requires that a template database is pre-constructed according to application scenes, wherein the template model database is a database for storing three-dimensional templates, the corresponding three-dimensional templates are prefabricated according to different types and parameters of the indication board, the three-dimensional templates comprise three-dimensional geometric information of a target object, specifically, the three-dimensional templates comprise geometric structure and size information, and optionally, the three-dimensional templates also comprise texture features of the target object. Optionally, the three-dimensional templates in the template database carry labels of the target object types, for example, the target object three-dimensional templates can comprise a three-dimensional indication board template, a three-dimensional street lamp template, a three-dimensional traffic signal box template and the like, and each template is preset with an initial angle, so that calculation of a subsequent rotation angle is facilitated.

The embodiment of the application provides a method and a device for reconstructing a single sign, electronic equipment and a storage medium, and the method, the device and the storage medium are respectively described in detail below.

The execution main body of the method for reconstructing the single indication board in the embodiment of the present application may be the single indication board reconstruction device provided in the embodiment of the present application, or different types of electronic devices such as a server device, a physical host, or a User Equipment (UE) integrated with the single indication board reconstruction device, where the single indication board reconstruction device may be implemented in a hardware or software manner, and the UE may specifically be a terminal device such as a smart phone, a tablet computer, a notebook computer, a palm computer, a desktop computer, or a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA).

The electronic device may be operated in a single operation mode, or may also be operated in a device cluster mode.

As shown in fig. 1, fig. 1 is a schematic view of a scene of a monomer reconstruction system for a sign provided by an embodiment of the present application. The single sign reconstruction system can comprise video acquisition equipment for shooting a target sign and a road three-dimensional real scene and electronic equipment 100 for completing a single sign reconstruction method, wherein a single sign reconstruction device is integrated in the electronic equipment 100. For example, the electronic device may acquire a sign video obtained by the video capture device capturing the same target sign from a plurality of different viewing angles; recognizing the card face characters on the target indication board in the indication board video to obtain a character recognition result; according to the indication board video, carrying out three-dimensional point cloud reconstruction on the target indication board to obtain three-dimensional point cloud information of the target indication board; constructing a three-dimensional indication board stereoscopic frame of the target indication board according to the three-dimensional point cloud information of the target indication board; and fusing the pre-constructed three-dimensional indication board template, the three-dimensional indication board three-dimensional frame and the character recognition result to generate a three-dimensional single indication board model of the target indication board, wherein the orientation and the board surface characters of the three-dimensional single indication board model are the same as those of the target indication board.

In addition, as shown in fig. 1, the sign monomer reconstruction system may further include a memory 200 for storing data such as video data, image data, device data of a video capture device for capturing video, and the like.

It should be noted that, the schematic view of the scene of the single-body reconstruction system of the indication board shown in fig. 1 is only an example, and the single-body reconstruction system of the indication board and the scene described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as a person of ordinary skill in the art can know that, along with the evolution of the single-body reconstruction system of the indication board and the appearance of a new service scene, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

In the embodiment of the present application, an electronic device is used as an execution body, and for simplicity and convenience of description, the execution body will be omitted in the subsequent method embodiment, and the method for reconstructing a single sign includes:

Recognizing the card face characters on the target indication board in the indication board video to obtain a character recognition result;

And fusing the pre-constructed three-dimensional indication board template, the three-dimensional indication board three-dimensional frame and the character recognition result to generate a three-dimensional single indication board model of the target indication board, wherein the orientation and the board surface characters of the three-dimensional single indication board model are the same as those of the target indication board.

As shown in fig. 2, which is a schematic flow chart of an embodiment of a method for reconstructing a monomer of a notice board in an embodiment of the present application, it should be noted that, although a logic sequence is shown in the flow chart, in some cases, the steps shown or described may be performed in a different sequence from that herein. The monomer reconstruction method of the indication board specifically comprises the following steps 201 to 205:

201. and acquiring the indication board videos obtained by shooting the same target indication board from a plurality of different visual angles by the video acquisition equipment.

The target indication board can be any indication board in street view, which needs to be subjected to three-dimensional reconstruction, the indication board video is video shot from a plurality of different view angles outside the target indication board through the video acquisition equipment, the indication board video is video formed by a plurality of two-dimensional indication board image sequences with overlapping degrees, and the indication board video comprises all visible parts of the target indication board.

In the application process, the video capturing device may be a fixed capturing device, such as a lightning all-in-one device, a camera, or the like, disposed near the target sign, and the video capturing device may also be a mobile capturing device, such as a vehicle-mounted capturing device, which is not specifically limited in this embodiment.

Therefore, the method and the device can acquire the indication board videos of the target indication boards from a plurality of different visual angles in real time, the characters and the textures in the acquired target indication board images are clearer, the method and the device are closer to the real building textures of the target indication boards, and the video acquisition equipment is adopted to acquire the indication board videos of the target indication boards at the same time, so that the time efficiency is higher, and the cost is lower.

In the embodiment, after the video acquisition device acquires the indication board video of the target indication board, a connection channel is established between the video acquisition device and the electronic device for executing the indication board monomer reconstruction method through the network transmission module, and the indication board video or the image acquired by the video acquisition device is sent to the electronic device for executing the indication board monomer reconstruction method in the form of a message, so that the electronic device for executing the indication board monomer reconstruction method acquires the indication board video of the target indication board, the data transmission cost of the indication board video is reduced, and meanwhile, the transmission efficiency is improved.

In this embodiment, in order to record the geographic information and shooting time information of the shot target indication board, when the video acquisition device transmits the indication board video to the electronic device, the spatio-temporal geographic information of the video acquisition device is transmitted at the same time, where the spatio-temporal geographic information includes information such as coordinate position information, attitude angle information, and timestamp information when the video acquisition device acquires the indication board video of the target indication board, and the data content transmitted by the video acquisition device is not specifically limited in this embodiment.

202. And recognizing the card face characters on the target indication board in the indication board video to obtain a character recognition result.

The sign video includes an image sequence composed of a plurality of two-dimensional sign images, and in this embodiment, the sign face characters on the target sign in the sign video are identified to obtain a character identification result, which specifically includes the following steps 2021 to 2023:

2021. detecting a target indication board from the two-dimensional indication board image, and generating an indication board detection frame;

Because the video acquisition equipment acquires pictures of other elements except the target indication board when acquiring the indication board video, in order to avoid interference of the pictures of the other elements in the indication board video to the identification of the face characters of the target indication board, the area of the target indication board needs to be determined from the two-dimensional indication board image before the face characters on the target indication board in the indication board video are identified, namely, the target indication board needs to be detected from the two-dimensional indication board image at first, and the indication board detection frame is generated.

In this embodiment, detecting a target sign from a two-dimensional sign image, generating a sign detection frame may specifically include:

And taking the two-dimensional indication board image as input, and detecting the target indication board through a preset target indication board detection model to obtain the two-dimensional indication board image containing the target indication board and the indication board detection frame.

In this embodiment, a training-completed target sign detection model is built in advance in an electronic device for executing a sign monomer reconstruction method, an image sequence composed of a plurality of frames of two-dimensional sign images is taken as input, target sign detection is performed on the two-dimensional sign images through the training-completed target sign detection model, and an image sequence with target signs and sign detection frames in each frame is output.

2022. And intercepting a target indication board image of a corresponding area of the indication board detection frame.

Because each frame of two-dimensional indication board image comprises the target indication board and other irrelevant areas, in order to avoid interference of the other irrelevant areas in the two-dimensional indication board image to the identification of the board surface characters of the target indication board, in the embodiment, the image of the area of the target indication board is intercepted based on the area of the target indication board marked by the indication board detection frame, the target indication board image is obtained, and the identification of the board surface characters is carried out based on the target indication board image. In this embodiment, the size of the target sign image may be the same as the size of the sign detection frame, that is, the plurality of vertices of the truncated target sign image are respectively in one-to-one correspondence with the plurality of vertices of the sign detection frame, and the size of the target sign image may be the same as the size of the sign detection frame, but the sign face character of the target sign in the target sign image needs to be ensured, which is not particularly limited in this embodiment.

2023. And recognizing the card face characters in the target indication card image to obtain a character recognition result.

In this embodiment, the target sign image may be analyzed and identified by optical character recognition (Optical Character Recognition, OCR) to obtain the face characters in the target sign image, and returned in text form, thereby obtaining the face characters on the target sign.

The face character recognition of the target indicator is carried out based on one frame of two-dimensional indicator image with the target indicator and the target indicator detection frame in the image sequence, and in the application process, the face character recognition can be carried out based on any frame of two-dimensional indicator image with the target indicator and the target indicator detection frame in the image sequence so as to recognize and obtain the face character in one two-dimensional indicator image;

In order to obtain more complete face characters, face character recognition can be performed based on multiple frames of two-dimensional indication board images or each frame of two-dimensional indication board image in an image sequence to obtain multiple face character results through recognition, and final face character recognition results are determined according to the multiple face character results.

203. And carrying out three-dimensional point cloud reconstruction on the target indication board according to the indication board video to obtain three-dimensional point cloud information of the target indication board.

In this embodiment, according to the video of the target sign, three-dimensional point cloud reconstruction is performed on the target sign to obtain three-dimensional point cloud information of the target sign, which specifically includes:

In the embodiment, a sparse point cloud reconstruction is performed on a target indication board through a motion restoration structure algorithm (Structure from motion, sfM), specifically, openSfM open source codes can be adopted to perform three-dimensional reconstruction, a multi-frame two-dimensional indication board image sequence is used as input, and image characteristic points of scale transformation and rotation angles in a two-dimensional indication board image are detected and extracted through a Shi & Tomasi algorithm, a SIFT algorithm or a SURF algorithm, wherein in image processing, the characteristic points refer to points with sharp changes of image gray values or points with larger curvatures on the edges of the image (namely points with intersection points of two edges), the image characteristic points in the two-dimensional indication board image can reflect essential characteristics of the two-dimensional indication board image, the target indication board in the two-dimensional indication board image can be identified, and matching of the target indication boards in a plurality of two-dimensional indication board images can be completed through matching of the image characteristic points;

After detecting and extracting the image feature points in the two-dimensional indication board images, matching the image feature points between every two-dimensional indication board images in the multi-frame two-dimensional indication board image sequence, and calculating corresponding matching points; and calculating a base matrix and an eigenvector according to the calculated matching points, performing singular value decomposition on the eigenvector to calculate a depth value of the image characteristic point, namely obtaining the position of the image characteristic point in a three-dimensional space, finally generating a three-dimensional sparse point cloud of the target indication board, and simultaneously calculating pose parameters of the video acquisition equipment when imaging, wherein the pose parameters are position information and pose information of the video acquisition equipment when shooting a two-dimensional indication board image.

According to the indication board video, carrying out three-dimensional point cloud reconstruction on the target indication board to obtain three-dimensional point cloud information of the target indication board, and further specifically comprising:

According to an image sequence and pose parameters formed by multi-frame two-dimensional indication board images, carrying out dense point cloud reconstruction on the target indication board to obtain a three-dimensional dense point cloud of the target indication board; and taking the three-dimensional dense point cloud of the target indication board as the three-dimensional point cloud information of the target indication board.

In this embodiment, after pose parameters of the video acquisition device during imaging are obtained, dense point cloud reconstruction is performed on the target indication board through a multi-view stereoscopic vision algorithm (Multiple View Stereo, MVS), so as to generate dense three-dimensional point cloud. Specifically, openMVS open source codes can be adopted to perform data processing, pose parameters of the video acquisition equipment and multi-frame two-dimensional indication board image sequences are used as inputs, pixel-by-pixel depth estimation is performed according to the pose parameters of the video acquisition equipment during imaging and the multi-frame two-dimensional indication board image sequences of multiple visual angles, dense three-dimensional point clouds are generated, and the three-dimensional point cloud information of the target indication board is finally obtained.

The pixel-by-pixel depth estimation is carried out according to pose parameters and multi-frame two-dimensional indication board image sequences of a plurality of visual angles when the video acquisition equipment images, specifically:

for a pixel p of a certain image feature point in the two-dimensional indication board image, according to the camera internal parameters, the pose parameters and the depth values of the image feature point, calculating to obtain three-dimensional point cloud coordinates in a real space:

P＝D(p)T^-1K^-1p

Wherein P is the three-dimensional point cloud coordinates of a point cloud coordinate system, D (P) is the depth value of a pixel P of a certain image feature point, T is the camera pose of the video acquisition equipment, the camera pose comprises a rotation matrix R and a translation vector T, and K is the camera internal reference of the video acquisition equipment.

204. And constructing a three-dimensional indication board stereoscopic frame of the target indication board according to the three-dimensional point cloud information of the target indication board.

According to the three-dimensional point cloud information of the target indication board, a three-dimensional indication board three-dimensional frame of the target indication board is constructed, and the method specifically comprises the following steps:

2041. and filtering the three-dimensional point cloud outside the indication board detection frame in the three-dimensional point cloud information according to the indication board detection frame, the pose parameters and the three-dimensional point cloud information to obtain filtered three-dimensional point cloud information.

In order to reduce the influence of the three-dimensional point cloud of the non-indication board outside the indication board detection frame on the construction process of the three-dimensional single indication board model, when three-dimensional point cloud information is connected, the three-dimensional point cloud outside the indication board detection frame is filtered according to the indication board detection frame, the pose parameters and the three-dimensional point cloud information, so that filtered three-dimensional point cloud information is obtained, and the accuracy of the three-dimensional single indication board model formed by the three-dimensional point cloud information is enhanced;

In this embodiment, for any one frame of two-dimensional sign image and a sign detection frame detected according to the frame of two-dimensional sign image, determining a filtering coordinate section according to coordinate values of each corner point of the sign detection frame and frame lengths of each frame of the sign detection frame; and removing the three-dimensional point cloud which is not in the range of the filtering coordinate interval, and filtering the three-dimensional point cloud outside the detection frame of the indication board.

Taking the Z coordinate of the three-dimensional point cloud information as an example, when a filtering coordinate interval formed by the sign board detection frame in the Z axis direction is [0.5,6], filtering three-dimensional point clouds with all Z axis values smaller than 0.5 and larger than 6 in the three-dimensional point clouds, and filtering the three-dimensional point clouds outside the sign board detection frame.

In this embodiment, three-dimensional point cloud filtering is performed on each frame of two-dimensional sign images in the multi-frame two-dimensional sign images to obtain multiple groups of three-dimensional point clouds which are respectively corresponding to the multi-frame two-dimensional sign images and are located outside the sign detection frame, and then three-dimensional point clouds with the same pose in the three-dimensional point clouds located outside the sign detection frame are fused through pose parameters, so that filtered three-dimensional point cloud information is finally obtained.

2042. And determining the three-dimensional indication board stereoscopic frame of the target indication board according to the frame formed by the filtered three-dimensional point cloud information.

And connecting the filtered three-dimensional point cloud information to generate a virtual minimum external three-dimensional frame, wherein the virtual minimum external three-dimensional frame is the three-dimensional indication board three-dimensional frame of the target indication board.

205. And fusing the pre-constructed three-dimensional indication board template, the three-dimensional indication board three-dimensional frame and the character recognition result to generate a three-dimensional single indication board model of the target indication board, wherein the orientation and the board surface characters of the three-dimensional single indication board model are the same as those of the target indication board.

In one possible implementation manner of the present application, before fusing a pre-constructed three-dimensional sign template, a three-dimensional sign stereoscopic frame and a character recognition result to generate a three-dimensional single sign model of a target sign, the method includes:

Determining the longest side of the indication board of the three-dimensional indication board three-dimensional frame in an xy two-dimensional horizontal plane, and determining the indication board orientation of the three-dimensional indication board three-dimensional frame in an xyz three-dimensional space coordinate system according to the azimuth angle of the longest side of the indication board in the xyz three-dimensional space coordinate system, wherein the xyz three-dimensional space coordinate system is a three-dimensional coordinate system formed by an x axis, a y axis and a z axis, and the xy two-dimensional horizontal plane is a two-dimensional plane formed by the xyz three-dimensional space coordinate system in the x axis and the y axis.

In the embodiment, in an xyz three-dimensional space coordinate system of three-dimensional reconstruction, coordinates x, y and z correspond to a real longitude, a latitude and an altitude respectively; in addition, according to the structure of the sign in reality, a fixed rod is usually fixed on the ground, a sign body pointing to the center of the road is fixed on the fixed rod, and the projection of the sign body of the target sign on the real ground corresponds to the projection of the target sign in the xy two-dimensional horizontal plane, so that the projection length of the sign body of the target sign is the longest side of the three-dimensional sign frame in the xy two-dimensional horizontal plane.

After the longest side of the three-dimensional indication board stereoscopic frame in the xy two-dimensional horizontal plane is determined, the pointed direction of the indication board body of the target indication board can be determined according to the azimuth angle of the longest side of the indication board in the xyz three-dimensional space coordinate system.

and determining the height of the indication board of the three-dimensional indication board stereoscopic frame according to the difference value between the maximum z-axis value and the minimum z-axis value of the three-dimensional indication board stereoscopic frame along the z-axis direction, wherein the z-axis direction is the direction in which the z-axis extends in the xyz three-dimensional space coordinate system.

The z-axis of the xyz three-dimensional space coordinate system corresponds to the actual height, so that the difference value between the maximum z-axis value and the minimum z-axis value of the three-dimensional indication board three-dimensional frame along the z-axis direction is calculated, and the indication board height of the three-dimensional indication board three-dimensional frame can be determined, wherein the z-axis value is the coordinate value of the three-dimensional indication board three-dimensional frame in the z-axis direction in the xyz three-dimensional space coordinate system.

In one possible implementation manner of the present application, a three-dimensional single sign model of a target sign is generated by fusing a three-dimensional sign template, a three-dimensional sign stereoscopic frame and a character recognition result, which are constructed in advance, and the method includes:

and scaling and rotating the three-dimensional indication board template according to the indication board orientation and the indication board height of the three-dimensional indication board three-dimensional frame, and fusing the three-dimensional indication board template into the three-dimensional indication board three-dimensional frame to obtain an initial three-dimensional single indication board model.

After determining the direction and the direction height of the three-dimensional direction board three-dimensional frame, the corresponding three-dimensional direction board templates can be called from a template database storing various three-dimensional direction board templates, and in this embodiment, the file name of the three-dimensional direction board template to be called can be determined specifically according to the direction and the direction height of the three-dimensional direction board three-dimensional frame; searching in a template database according to the file name of the three-dimensional indication board template to obtain a three-dimensional indication board template of the target; reading the three-dimensional indication board template, scaling and rotating the three-dimensional indication board template according to the indication board direction and the indication board height of the three-dimensional indication board three-dimensional frame, and embedding the adjusted three-dimensional indication board template into the three-dimensional indication board three-dimensional frame to obtain an initial three-dimensional single indication board model with clear indication board textures.

After the initial three-dimensional single-body indication board model is obtained, the identification result of the board surface characters obtained in the step 202 is embedded into the position of the initial three-dimensional single-body indication board model for displaying the board surface characters in sequence, and finally the three-dimensional single-body indication board model corresponding to the target indication board is obtained, wherein the three-dimensional single-body indication board model can be shown as a figure 3.

Therefore, the single body reconstruction method of the indication board, which is provided by the application, combines the vehicle-mounted positioning and video acquisition equipment to acquire the video data of the target indication board, can accurately lock the position and size information of the target indication board with low cost, can complete the three-dimensional reconstruction of the target indication board, can restore the three-dimensional single body indication board model with the position, texture and board surface characters of the target indication board, improves the reconstruction effect of the real-scene road indication board, and effectively solves the problem that the unmanned aerial vehicle cannot reconstruct the complete and clear three-dimensional indication board model.

In another embodiment of the present application, the target sign detection model may be trained by:

Adopting EFFICIENTNET as a backbone network, adopting a yolox or an anchor free target detection model as a model to be trained for training, taking a sample image frame aggregate of the indication board sample video stored in the outer vertical surface complete sample library as input of the model to be trained, and taking an image frame containing indication boards and an indication board detection frame as output for model training to obtain a target indication board detection model;

in the training process, the modeling capability of the model to be trained can be improved by adopting a data enhancement mode, wherein the data enhancement mode specifically comprises the following modes but not limited to the following modes:

(1) Random cropping (Random Crop) is performed on the sample image frames in the sample image frame set, specifically, cropping is performed on the sample image frames in a region with a Random ratio of 0.6-1.0, and the cropped sample image frames are used as input of a model to be trained;

(2) The Dropblock layers are embedded in the network, namely the backbone network comprises a plurality of inner roll layers, a pooling layer and Dropblock layers, in Dropblock layers, the neighborhood space pixel point with the area size of K multiplied by R in the discarding feature map is discarded, the discarding probability is p, and in an exemplary Dropblock layers, the neighborhood space pixel point with the area size of 3 multiplied by 3 in the discarding feature map is set, and the discarding probability is set to be 0.1;

(3) The Mosaic mosaics are adopted to realize data enhancement, specifically, four sample image frames in a sample image frame set are spliced into one Mosaic image randomly, and a new sample image obtained by splicing is used as training data to be used as input for model training.

In this embodiment, multi-scale training (Multi SCALE TRAINING, MST) is adopted, so that the risk of model overfitting is reduced, and the robustness of the target indication board detection model is enhanced.

In another embodiment of the present application, after constructing the three-dimensional sign stereoscopic frame of the target sign according to the three-dimensional point cloud information of the target sign, the method further includes:

And storing the three-dimensional frame of the three-dimensional indication board and the character recognition result into an indication board sample database.

The data in the sign sample database can be used for training the target sign detection model, so that continuous accumulation of model training data is realized, and continuous iteration of the target sign detection model is facilitated.

In order to better implement the method for reconstructing the single sign in the embodiment of the present application, based on the method for reconstructing the single sign, the embodiment of the present application further provides a device for reconstructing the single sign, as shown in fig. 4, where the device 300 for reconstructing the single sign includes:

the video acquisition module 301 is configured to acquire a sign video obtained by capturing, by a video acquisition device, the same target sign from a plurality of different viewing angles;

The character recognition module 302 is configured to recognize a face character on a target sign in the sign video, and obtain a character recognition result;

The point cloud generating module 303 is configured to reconstruct a three-dimensional point cloud of the target sign according to the sign video, so as to obtain three-dimensional point cloud information of the target sign;

The frame construction module 304 is configured to construct a three-dimensional frame of the target indication board according to the three-dimensional point cloud information of the target indication board;

The model generating module 305 is configured to fuse the pre-constructed three-dimensional sign template, the three-dimensional sign stereoscopic frame and the character recognition result to generate a three-dimensional single sign model of the target sign, where the orientation and the face character of the three-dimensional single sign model are the same as those of the target sign.

The sign video comprises an image sequence formed by a plurality of frames of two-dimensional sign images, and the character recognition module 302 is specifically;

the method comprises the steps of detecting a target indication board from a two-dimensional indication board image, and generating an indication board detection frame;

the target indication board image is used for intercepting the corresponding area of the indication board detection frame;

The method is used for identifying the card face characters in the target indication card image to obtain a character identification result.

The point cloud generating module 303 further specifically includes:

the method is used for carrying out sparse point cloud reconstruction on the target indication board according to an image sequence formed by multi-frame two-dimensional indication board images to obtain pose parameters when the video acquisition equipment images.

The point cloud generating module 303 further specifically includes:

The method comprises the steps of carrying out dense point cloud reconstruction on a target indication board according to an image sequence and pose parameters formed by multi-frame two-dimensional indication board images to obtain a three-dimensional dense point cloud of the target indication board;

the three-dimensional dense point cloud information is used for taking the three-dimensional dense point cloud of the target indication board as the three-dimensional point cloud information of the target indication board.

The frame construction module 304 specifically includes:

The three-dimensional point cloud information processing module is used for filtering the three-dimensional point cloud outside the indication board detection frame in the three-dimensional point cloud information according to the indication board detection frame, the pose parameters and the three-dimensional point cloud information to obtain filtered three-dimensional point cloud information;

And the three-dimensional indication board framework is used for determining the three-dimensional indication board framework of the target indication board according to the framework formed by the filtered three-dimensional point cloud information.

The single sign reconstruction device 300 further includes a sign orientation determining module, where the sign orientation determining module specifically includes:

the method comprises the steps of determining the longest side of a three-dimensional indication board in an xy two-dimensional horizontal plane;

The method comprises the steps of determining the direction of a three-dimensional direction board stereoscopic frame in an xyz three-dimensional space coordinate system according to the azimuth angle of the longest side of the direction board in the xyz three-dimensional space coordinate system;

The xyz three-dimensional space coordinate system is a three-dimensional coordinate system formed by an x-axis, a y-axis and a z-axis, and the xy two-dimensional horizontal plane is a two-dimensional plane formed by the xyz three-dimensional space coordinate system in the x-axis and the y-axis.

The single sign reconstruction device 300 further includes a sign height determining module, where the sign height determining module specifically includes:

the method comprises the steps of determining the height of a three-dimensional indication board according to the difference value between the maximum z-axis value and the minimum z-axis value of the three-dimensional indication board three-dimensional frame along the z-axis direction;

The model generation module 305 specifically includes:

the three-dimensional single-body indication board model is used for scaling and rotating the three-dimensional indication board template according to the indication board direction and the indication board height of the three-dimensional indication board three-dimensional frame and fusing the three-dimensional indication board template into the three-dimensional indication board three-dimensional frame to obtain an initial three-dimensional single-body indication board model;

And the method is used for fusing the card face characters in the character recognition result to the indication board face of the initial three-dimensional single indication board model to finally obtain the three-dimensional single indication board model.

In another embodiment of the present application, as shown in fig. 5, the present application further provides an electronic device 400, which shows a schematic structural diagram of the electronic device according to the embodiment of the present application, specifically:

The electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 5 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

The processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; the Processor 401 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and preferably, the processor 401 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, and the like, with a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are achieved by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

In some embodiments of the application, the application also provides a computer readable storage medium, which may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like. The method for reconstructing the indication board monomer comprises the steps of storing a computer program on the indication board monomer, and loading the computer program by a processor to execute the steps in the method for reconstructing the indication board monomer. For example, the loading of the computer program by the processor may perform the steps of:

In the foregoing embodiments, the descriptions of the embodiments are focused on, and the portions of one embodiment that are not described in detail in the foregoing embodiments may be referred to in the foregoing detailed description of other embodiments, which are not described herein again.

The method, the device, the electronic equipment and the storage medium for reconstructing the indication board monomer provided by the embodiment of the application are described in detail, and specific examples are applied to the description of the principle and the implementation mode of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. The method for reconstructing the single sign is characterized by comprising the following steps of:

2. The method for reconstructing a single sign according to claim 1, wherein the sign video includes an image sequence composed of a plurality of two-dimensional sign images, and the step of recognizing a face character on the target sign in the sign video to obtain a character recognition result includes:

3. The method for reconstructing a single sign according to claim 2, wherein the reconstructing the three-dimensional point cloud of the target sign according to the video of the sign to obtain the three-dimensional point cloud information of the target sign comprises: and carrying out sparse point cloud reconstruction on the target indication board according to an image sequence formed by the multi-frame two-dimensional indication board images to obtain pose parameters when the video acquisition equipment images.

4. The method for reconstructing a single sign according to claim 3, wherein the reconstructing the three-dimensional point cloud of the target sign according to the video of the sign to obtain the three-dimensional point cloud information of the target sign further comprises:

5. The method for reconstructing a single body of a sign according to claim 3, wherein the constructing a three-dimensional sign stereoscopic frame of the target sign according to the three-dimensional point cloud information of the target sign comprises:

6. The method of claim 1, wherein prior to fusing the pre-constructed three-dimensional tile template, the three-dimensional tile stereoscopic frame, and the character recognition result to generate the three-dimensional tile model of the target tile, the method comprises:

7. The method of claim 6, wherein prior to fusing the pre-constructed three-dimensional tile template, the three-dimensional tile stereoscopic frame, and the character recognition result to generate the three-dimensional tile model of the target tile, the method comprises:

8. The method for reconstructing a single sign of claim 7, wherein the fusing the pre-constructed three-dimensional sign template, the three-dimensional sign stereoscopic frame, and the character recognition result to generate the three-dimensional single sign model of the target sign comprises:

9. A sign monomer reconstruction device, the device comprising:

10. A computer-readable storage medium, having stored thereon a computer program, the computer program being loaded by a processor to perform the steps of the sign monomer reconstruction method of any one of claims 1 to 8.