CN115546769B

CN115546769B - Road image recognition method, device, equipment and computer readable medium

Info

Publication number: CN115546769B
Application number: CN202211533230.5A
Authority: CN
Inventors: 蒋建辉; 李敏; 龙文; 艾永军; 王倩; 申苗; 黄家琪; 刘智睿
Original assignee: GAC Aion New Energy Automobile Co Ltd
Current assignee: GAC Aion New Energy Automobile Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-03-24
Anticipated expiration: 2042-12-02
Also published as: CN115546769A

Abstract

The disclosure discloses a road image recognition method, a road image recognition device, equipment and a computer readable medium. One embodiment of the method comprises: respectively inputting a plurality of road images into a road detection model; carrying out image fusion on the plurality of road images to obtain a fused road image; inputting the fused road image into a feature extraction network; respectively compressing the first characteristic diagram and the second characteristic diagram; the second compression characteristic diagram is subjected to up-sampling to obtain a second up-sampling characteristic diagram, and the second up-sampling characteristic diagram and the first compression characteristic diagram are subjected to characteristic fusion to obtain a fusion characteristic diagram; inputting the fusion feature map into a semantic segmentation network to obtain a segmentation feature map; and splicing the segmentation characteristic diagram and the third characteristic diagram, and inputting the spliced segmentation characteristic diagram and the third characteristic diagram into a decoding network to obtain the road sign segmentation image and the road sign category information displayed in the road sign segmentation image. This embodiment enables more accurate road sign detection.

Description

Road image recognition method, device, equipment and computer readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a road image identification method, a road image identification device, road image identification equipment and a computer readable medium.

Background

With the development of the 5G communication technology and the artificial intelligence technology, the development of the vehicle driving assistance system is greatly promoted. Existing driving assistance systems often rely on the development of computer vision technology.

However, the driving assistance system described above often has the following technical problems:

firstly, aiming at the detection of the road mark, firstly, an image acquisition device arranged on a vehicle acquires a road image, and then the image acquisition device performs image segmentation on the road image, so that the detection of the road mark is realized, and further the positioning of the vehicle on the road and the adaptive adjustment of the driving parameters of the vehicle can be realized. However, the road marker detection method is easily affected by environmental factors, for example, when the road marker detection method is affected by environments such as dark light, rainy and snowy weather, the problem of inaccurate detection is easily caused.

Second, when new energy vehicle's electric quantity is not enough, new energy vehicle often need face the problem of looking for temporarily filling electric pile, and the driver is not known to the condition such as the condition of lining up of each position of charging near or fill electric pile quantity, consequently needs the user to decide which charging position to go according to experience and charges, often can lead to some position of charging vehicle to queue up the time for a long time, and some position of charging fill the idle problem of electric pile, cause and fill electric pile unbalanced problem. In addition, because the electric quantity is continuously consumed in the process of driving to the charging points, when the waiting time is found to be too long when one charging point arrives, the residual electric quantity of the vehicle may not support to search for other charging points, and the problem of queuing waiting is further aggravated.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art in this country.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a road image recognition method, apparatus, device, computer readable medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a road image recognition method, including: collecting a plurality of road images at different viewing angles by a plurality of image collecting devices installed on a vehicle; respectively inputting the road images into a road detection model to obtain a road detection result corresponding to each road image, wherein the road detection result comprises a road sign boundary frame and confidence; performing image fusion on the plurality of road images according to the confidence corresponding to each road image to obtain a fused road image; inputting the fused road image into a feature extraction network, wherein the feature extraction network comprises a plurality of feature extraction layers, the plurality of feature extraction layers are used for outputting a plurality of feature maps with different sizes, the feature map with the largest size in the plurality of feature maps with different sizes is used as a first feature map, the feature map with the smallest size is used as a second feature map, and the feature map with the size smaller than the first feature map and larger than the second feature map is used as a third feature map; respectively performing channel compression on the first characteristic diagram and the second characteristic diagram to obtain a first compression characteristic diagram and a second compression characteristic diagram, wherein the channel numbers of the first compression characteristic diagram and the second compression characteristic diagram are consistent; the second up-sampling characteristic diagram is up-sampled to obtain a second up-sampling characteristic diagram, the size of the second up-sampling characteristic diagram is consistent with that of the first compression characteristic diagram, and the second up-sampling characteristic diagram and the first compression characteristic diagram are subjected to characteristic fusion to obtain a fusion characteristic diagram; inputting the fused feature map into a semantic segmentation network to obtain a segmentation feature map; and splicing the segmentation characteristic diagram and the third characteristic diagram, and inputting the spliced segmentation characteristic diagram and the third characteristic diagram into a decoding network to obtain the road sign segmentation image and the road sign category information displayed in the road sign segmentation image.

In a second aspect, some embodiments of the present disclosure provide a road image recognition apparatus, including: an acquisition unit configured to acquire a plurality of road images of different viewpoints by a plurality of image acquisition devices mounted to a vehicle; the detection unit is configured to input the road images into a road detection model respectively to obtain a road detection result corresponding to each road image, wherein the road detection result comprises a road sign boundary box and a confidence coefficient; the fusion unit is configured to perform image fusion on the plurality of road images according to the confidence coefficient corresponding to each road image to obtain a fusion road image; an extraction unit configured to input the fused road image into a feature extraction network, the feature extraction network including a plurality of feature extraction layers for outputting a plurality of feature maps of different sizes, and taking a feature map of a largest size among the plurality of feature maps of different sizes as a first feature map, a feature map of a smallest size as a second feature map, and a feature map of a size smaller than the first feature map and larger than the second feature map as a third feature map; the compression unit is configured to perform channel compression on the first feature map and the second feature map respectively to obtain a first compressed feature map and a second compressed feature map, and the channel numbers of the first compressed feature map and the second compressed feature map are consistent; the up-sampling unit is configured to up-sample the second compressed feature map to obtain a second up-sampling feature map, the size of the second up-sampling feature map is consistent with that of the first compressed feature map, and feature fusion is performed on the second up-sampling feature map and the first compressed feature map to obtain a fusion feature map; the segmentation unit is configured to input the fused feature map into a semantic segmentation network to obtain a segmentation feature map; and the decoding unit is configured to splice the segmentation characteristic map and the third characteristic map and input the spliced segmentation characteristic map and the third characteristic map into a decoding network to obtain the road sign segmentation image and the category information of the road sign displayed in the road sign segmentation image.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: more accurate road sign detection is achieved. Specifically, the reason why the existing road sign detection method is inaccurate in detection is that: the existing image segmentation model cannot cope with extreme scenes such as dark light, rain and snow weather, and based on this, some embodiments of the present disclosure firstly comprehensively consider a plurality of road images at different viewing angles. In practice, under extreme scenes such as dark light, rainy and snowy weather and the like, the quality of the image can be effectively enhanced by integrating different visual angles. In the process, by referring to the confidence and performing image fusion according to the confidence, the higher the confidence is, the higher the weight of the road image is, so that the image quality of the fused road image is effectively enhanced. In addition, the reason why the existing road sign detection method is inaccurate in detection is that: the existing road sign detection method only utilizes the feature map output by the feature extraction layer of the last layer, so that the information loss in the feature map is serious, and the detection is inaccurate. Based on this, some embodiments of the present disclosure fully utilize the first feature map with the largest size, the information loss of the first feature map is small, more features can be retained to the greatest extent, and then the first feature map and the second feature map are fused to obtain a fused feature map input semantic segmentation network, so that the detection accuracy can be improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a flow diagram of some embodiments of a road image identification method according to the present disclosure;

FIG. 2 is a schematic block diagram of some embodiments of a road image recognition device according to the present disclosure;

FIG. 3 is a schematic block diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and the embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, a flow 100 of some embodiments of a road image identification method according to the present disclosure is shown. The road image identification method comprises the following steps:

step 101, a plurality of road images with different viewing angles are collected through a plurality of image collecting devices installed on a vehicle.

The image acquisition device may be mounted in various orientations of the vehicle, for example, may include 2 front view cameras, 3 rear view cameras, 6 around view cameras.

Step 102, respectively inputting the road images into a road sign detection model to obtain a road sign detection result corresponding to each road image, wherein the road sign detection result comprises a road sign boundary frame and a confidence coefficient.

In some embodiments, the road sign detection model may be a YOLO network, the YOLO series being deep learning based regression methods, YOLO containing the confidence of objects directly within the location and bounding box of the output layer regression bounding box.

And 103, carrying out image fusion on the plurality of road images according to the confidence coefficient corresponding to each road image to obtain a fused road image.

In some embodiments, for each road image, the road image may be weighted according to its confidence, with higher confidence weighting being greater. And then, adding the road images according to the weight of each road image according to pixels to obtain a fused road image.

For example, performing image fusion on a plurality of road images according to the confidence degree corresponding to each road image to obtain a fused road image may include: responding to the fact that the confidence degrees corresponding to the road images of the first preset number in the plurality of road images are all smaller than a preset confidence degree threshold value; acquiring a historical road image corresponding to the current position information of the vehicle; and carrying out image fusion on the plurality of road images according to the confidence corresponding to each road image to obtain a fused road image, wherein the image fusion comprises the following steps: carrying out image fusion on a plurality of road images and historical road images according to the corresponding confidence of each road image to obtain a fused road image, wherein the historical road imagesThe image corresponding confidence is obtained by weighting the multiple image quality index scores of the historical road images. The image quality indicator score may be an image brightness score, an image resolution score, or the like. The influence of the current environmental factors can be effectively eliminated by introducing the historical road image, and the confidence of the historical road image is related to the image quality index score of the historical road image, so that the weight of the high-quality image can be improved, and the image quality of the fused road image is further improved. In practice, the confidence of the historical road image needs to consider the matching degree of the scene corresponding to the current position information of the historical road image besides being related to the image quality index score thereof. In practice, the historical road image is stored in association with the corresponding position information. Therefore, the distance between the two positions can be determined according to the position information (longitude and latitude or coordinates) corresponding to the historical road image and the current position information. The confidence of the historical road image is inversely proportional to the distance. Confidence corresponding to historical road image

Can be determined by the following formula:

wherein the content of the first and second substances,

a weighted sum of a plurality of image quality indicator scores representing historic road images->

And &>

Is a predetermined coefficient>

Is the above distance. The corresponding of the proper historical road image can be determined through the formulaThe confidence coefficient ensures the matching degree of the image of the fusion road image and the current position while improving the image quality of the fusion road image, and avoids inaccurate detection caused by low scene matching degree due to too long distance difference.

And step 104, inputting the fused road image into a feature extraction network, wherein the feature extraction network comprises a plurality of feature extraction layers, the feature extraction layers are used for outputting a plurality of feature maps with different sizes, the feature map with the largest size in the feature maps with different sizes is used as a first feature map, the feature map with the smallest size is used as a second feature map, and the feature map with the size smaller than that of the first feature map and larger than that of the second feature map is used as a third feature map.

In some embodiments, the feature extraction network may include a plurality of feature extraction layers, which may be convolutional layers with different convolutional kernel sizes, so that a plurality of feature maps of different sizes may be obtained. Multiple convolutional layers may be connected in series, i.e., the output of one convolutional layer serves as the input to another convolutional layer.

And 105, respectively performing channel compression on the first feature map and the second feature map to obtain a first compression feature map and a second compression feature map, wherein the channel numbers of the first compression feature map and the second compression feature map are consistent. For example, channel compression may be performed by 1x1 convolution.

And 106, performing upsampling on the second compressed feature map to obtain a second upsampled feature map, wherein the size of the second upsampled feature map is consistent with that of the first compressed feature map, and performing feature fusion on the second upsampled feature map and the first compressed feature map to obtain a fused feature map.

And step 107, inputting the fused feature map into a semantic segmentation network to obtain a segmentation feature map.

The semantic segmentation network may adopt FCN (full Convolution Networks), U-Net, and other Networks, wherein the FCN may learn deep semantic features and map back to the original image for pixel level prediction. In practice, the semantic segmentation network may be pre-trained with a sample set in advance.

Optionally, the semantic segmentation network includes a plurality of expansion convolutional layers, and the plurality of expansion convolutional layers have different expansion rates correspondingly; and inputting the fusion feature map into a semantic segmentation network to obtain a segmentation feature map, wherein the segmentation feature map comprises the following steps: inputting the fusion characteristic diagram into a first expansion convolution layer in the plurality of expansion convolution layers to obtain a first expansion characteristic diagram; performing channel splicing on the first expansion characteristic diagram and the fusion characteristic diagram, and inputting the first expansion characteristic diagram and the fusion characteristic diagram into a second expansion convolutional layer in the plurality of expansion convolutional layers to obtain a second expansion characteristic diagram; and splicing the first expansion characteristic diagram and the second expansion characteristic diagram to obtain a segmentation characteristic diagram. In the process, due to expansion or loss of details, the road signs displayed in the road sign segmentation images are incomplete in the road image recognition result, so that the results of expansion convolutional layers with different expansion rates are spliced, a more dense receptive field and pixel extraction are obtained, and the detection accuracy is further improved.

And step 108, splicing the segmentation feature map and the third feature map, and inputting the spliced segmentation feature map and the third feature map into a decoding network to obtain the road sign segmentation image and the road sign category information displayed in the road sign segmentation image.

Wherein the decoding network may include a plurality of upsampling layers and feature stack layers. The feature stack layer is used for carrying out channel adjustment on the input feature diagram to obtain an adjusted feature diagram. And then inputting the adjusted feature maps into a plurality of upper sampling layers for size reduction, and finally obtaining the road sign segmentation images with the same length and width as the input images.

Some embodiments of the present disclosure provide methods that enable more accurate road sign detection. Specifically, the reason why the conventional road sign detection method is inaccurate in detection is that: the existing image segmentation model cannot cope with extreme scenes such as dark light, rain and snow weather, and based on this, some embodiments of the present disclosure firstly comprehensively consider a plurality of road images at different viewing angles. In practice, under extreme scenes such as dark light, rainy and snowy weather and the like, the quality of the image can be effectively enhanced by integrating different visual angles. In the process, by referring to the confidence and performing image fusion according to the confidence, the higher the confidence is, the higher the weight of the road image is, so that the image quality of the fused road image is effectively enhanced. In addition, the reason that the existing road sign detection method is inaccurate in detection is that: the existing road sign detection method only utilizes the feature map output by the feature extraction layer of the last layer, so that the information loss in the feature map is serious, and the detection is inaccurate. Based on this, some embodiments of the present disclosure fully utilize the first feature map with the largest size, the information loss of the first feature map is small, more features can be retained to the greatest extent, and then the first feature map and the second feature map are fused to obtain a fused feature map input semantic segmentation network, so that the detection accuracy can be improved.

In some alternative implementations of some embodiments, in order to further improve the accuracy of detection, the characteristics of the road signs, i.e. generally regular figures, and different line types and colors usually express specific meanings are fully considered, based on which the respective road signs may be modeled in advance, e.g. a double solid line may be modeled by two straight line functions. Some embodiments of the disclosure further comprise the steps of:

according to the category information, determining a road sign model matched with the road sign segmentation image from a road sign model library constructed in advance, wherein the road sign model in the road sign model library is used for describing the shape and the line segment type of a road sign. Wherein the content of the first and second substances,

and step two, correcting the road sign segmentation image by using the road sign model to obtain a corrected road sign segmentation image.

For example, the road sign model includes two straight line functions, which may be used to fit foreground colors in the segmented image, delete foreground pixels outside the straight line, and fill foreground in non-foreground pixels inside the straight line, thereby obtaining a candidate modified road sign segmented image. In addition, the line segment type can be a dotted line or a solid line, and the candidate corrected road sign segmentation image can be secondarily corrected by the corresponding line type to obtain the corrected road sign segmentation image, so that accurate driving assistance can be realized on the basis.

In some optional implementation manners of some embodiments, in vehicle driving, in order to solve the technical problem three described in the background section, when the electric quantity of the new energy vehicle is insufficient, the new energy vehicle often needs to face the problem of temporarily searching for the charging pile, and a driver does not know the queuing conditions of each charging point near the driver or the number of the charging piles, and the like, so that the user needs to determine which charging point to charge according to experience, which often causes the problems that the vehicles at some charging points queue for a long time, and the charging piles at some charging points are idle, and the charging pile is unbalanced in use. In addition, because the electric quantity is continuously consumed in the process of driving to the charging points, when the waiting time is too long when one charging point is reached, the residual electric quantity of the vehicle may not support to search for other charging points, and the problem of queuing waiting is further aggravated. Some embodiments of the disclosure further comprise the steps of:

monitoring the residual electric quantity in the driving process of a vehicle, reading charging point location information of a plurality of charging point locations within a preset range of a current position through an electronic map interface in response to the fact that the residual electric quantity is lower than a first electric quantity threshold value, wherein the charging point location information comprises position information and estimated queuing time;

step two, according to the deviation distance of each charging point position relative to the current planning route in the position information of each charging point position

A distance from the current position->

；

Acquiring estimated queuing time of each charging point in a plurality of charging points through the cloud

；

Step four, sequencing the position information of the plurality of charging point locations according to the deviation distance, the distance between the deviation distance and the current position and the estimated queuing time to obtain a charging point location information sequence;

firstly, screening charging point locations meeting conditions through the following formula to obtain a candidate charging point location information set;

on the basis, the score of each charging point is determined by the following formula

：

Sequencing the charging point positions from top to bottom according to the scores of the charging point positions to obtain a charging point position information sequence, wherein,

is based on unit distance power consumption>

Is the remaining electric quantity->

、/>

Are all preset coefficients.

Displaying the charging point location information sequence in a vehicle-mounted system screen of the vehicle in sequence, and prompting a user to select target charging point location information;

and step six, receiving the position information of the target charging point selected by the user, and planning the path again.

Therefore, the problem that the user charges the electric pile unevenly and the user selects the charging point according to experience can be avoided, so that the user queuing time is reduced while the charging pile is more balanced in use.

With further reference to fig. 2, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of a road image recognition apparatus, which correspond to those illustrated in fig. 1, and which may be particularly applicable in various electronic devices.

As shown in fig. 2, a road image recognition apparatus 200 of some embodiments includes: the acquisition unit 201 is configured to acquire a plurality of road images of different view angles by a plurality of image acquisition devices mounted to a vehicle; the detection unit 202 is configured to input the road images into a road detection model respectively, and obtain a road detection result corresponding to each road image, where the road detection result includes a road sign bounding box and a confidence level; the fusion unit 203 is configured to perform image fusion on the plurality of road images according to the confidence corresponding to each road image, so as to obtain a fused road image; the extraction unit 204 is configured to input the fused road image into a feature extraction network, which includes a plurality of feature extraction layers for outputting a plurality of feature maps of different sizes, and to take a feature map of a largest size among the plurality of feature maps of different sizes as a first feature map, a feature map of a smallest size as a second feature map, and a feature map of a size smaller than the first feature map and larger than the second feature map as a third feature map; the compressing unit 205 is configured to perform channel compression on the first feature map and the second feature map respectively to obtain a first compressed feature map and a second compressed feature map, where the channel numbers of the first compressed feature map and the second compressed feature map are the same; the upsampling unit 206 is configured to upsample the second compressed feature map to obtain a second upsampled feature map, where the size of the second upsampled feature map is the same as that of the first compressed feature map, and perform feature fusion on the second upsampled feature map and the first compressed feature map to obtain a fused feature map; the segmentation unit 207 is configured to input the fused feature map into a semantic segmentation network, resulting in a segmented feature map; the decoding unit 208 is configured to concatenate the segmented feature map and the third feature map and input the concatenated feature map into a decoding network, so as to obtain a road sign segmented image and category information of a road sign displayed in the road sign segmented image.

It will be understood that the units described in the apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 200 and the units included therein, and are not described herein again.

Referring now to FIG. 3, a block diagram of an electronic device 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device in some embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate with other devices, wireless or wired, to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 3 may represent one device or may represent multiple devices, as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 309, or installed from the storage device 308, or installed from the ROM 302. The computer program, when executed by the processing apparatus 301, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (Hyper Text Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: collecting a plurality of road images at different viewing angles through a plurality of image collecting devices installed on a vehicle; respectively inputting the road images into a road detection model to obtain a road detection result corresponding to each road image, wherein the road detection result comprises a road sign boundary frame and confidence; performing image fusion on the plurality of road images according to the confidence corresponding to each road image to obtain a fused road image; inputting the fused road image into a feature extraction network, wherein the feature extraction network comprises a plurality of feature extraction layers, the plurality of feature extraction layers are used for outputting a plurality of feature maps with different sizes, the feature map with the largest size in the plurality of feature maps with different sizes is used as a first feature map, the feature map with the smallest size is used as a second feature map, and the feature map with the size smaller than the first feature map and larger than the second feature map is used as a third feature map; respectively compressing channels of the first characteristic diagram and the second characteristic diagram to obtain a first compressed characteristic diagram and a second compressed characteristic diagram, wherein the channel numbers of the first compressed characteristic diagram and the second compressed characteristic diagram are consistent; the second up-sampling characteristic diagram is up-sampled to obtain a second up-sampling characteristic diagram, the size of the second up-sampling characteristic diagram is consistent with that of the first compression characteristic diagram, and the second up-sampling characteristic diagram and the first compression characteristic diagram are subjected to characteristic fusion to obtain a fusion characteristic diagram; inputting the fusion feature map into a semantic segmentation network to obtain a segmentation feature map; and splicing the segmentation characteristic map and the third characteristic map, and inputting the spliced segmentation characteristic map and the third characteristic map into a decoding network to obtain the road sign segmentation image and the road sign category information displayed in the road sign segmentation image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software or hardware. The described elements may also be provided within a processor, and the names of the elements do not in some cases constitute limitations on the elements themselves.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A road image recognition method, comprising:

collecting a plurality of road images at different viewing angles through a plurality of image collecting devices installed on a vehicle;

respectively inputting the road images into a road sign detection model to obtain a road sign detection result corresponding to each road image, wherein the road sign detection result comprises a road sign boundary frame and a confidence coefficient;

performing image fusion on the plurality of road images according to the confidence corresponding to each road image to obtain a fused road image;

inputting the fused road image into a feature extraction network, wherein the feature extraction network comprises a plurality of feature extraction layers, the feature extraction layers are used for outputting a plurality of feature maps with different sizes, a feature map with the largest size in the feature maps with different sizes is used as a first feature map, a feature map with the smallest size is used as a second feature map, and a feature map with the size smaller than the first feature map and larger than the second feature map is used as a third feature map;

respectively performing channel compression on the first feature map and the second feature map to obtain a first compressed feature map and a second compressed feature map, wherein the channel numbers of the first compressed feature map and the second compressed feature map are consistent;

the second up-sampling feature map is up-sampled to obtain a second up-sampling feature map, the size of the second up-sampling feature map is consistent with that of the first compression feature map, and feature fusion is carried out on the second up-sampling feature map and the first compression feature map to obtain a fusion feature map;

inputting the fusion feature map into a semantic segmentation network to obtain a segmentation feature map;

and splicing the segmentation characteristic diagram and the third characteristic diagram, and inputting the spliced segmentation characteristic diagram and the third characteristic diagram into a decoding network to obtain a road sign segmentation image and the category information of the road sign displayed in the road sign segmentation image.

2. The method according to claim 1, wherein before the image fusing the road images according to the confidence degree corresponding to each road image to obtain a fused road image, the method further comprises:

responding to the fact that the confidence degrees corresponding to a first preset number of road images in the plurality of road images are all smaller than a preset confidence degree threshold value;

acquiring a historical road image corresponding to the current position information of the vehicle; and

the image fusion of the road images according to the confidence degree corresponding to each road image to obtain a fused road image comprises the following steps:

and carrying out image fusion on the multiple road images and the historical road image according to the confidence coefficient corresponding to each road image to obtain a fused road image, wherein the confidence coefficient corresponding to the historical road image is obtained by weighting multiple image quality index scores of the historical road image.

3. The method of claim 1 or 2, wherein the semantic segmentation network comprises a plurality of expanded convolutional layers corresponding to different expansion rates; and

inputting the fusion feature map into a semantic segmentation network to obtain a segmentation feature map, wherein the segmentation feature map comprises the following steps:

inputting the fusion characteristic diagram into a first expansion convolution layer in the plurality of expansion convolution layers to obtain a first expansion characteristic diagram;

performing channel splicing on the first expansion characteristic diagram and the fusion characteristic diagram, and inputting the first expansion characteristic diagram and the fusion characteristic diagram into a second expansion convolutional layer in the plurality of expansion convolutional layers to obtain a second expansion characteristic diagram;

and splicing the first expansion characteristic diagram and the second expansion characteristic diagram to obtain a segmentation characteristic diagram.

4. The method of claim 3, wherein the method further comprises:

determining a road sign model matched with the road sign segmentation image from a road sign model library constructed in advance according to the category information, wherein the road sign model in the road sign model library is used for describing the shape and the line segment type of a road sign;

and correcting the road sign segmentation image by using the road sign model to obtain a corrected road sign segmentation image.

5. A road image recognition device comprising:

an acquisition unit configured to acquire a plurality of road images of different viewpoints by a plurality of image acquisition devices mounted to a vehicle;

the detection unit is configured to input the road images into a road detection model respectively to obtain a road detection result corresponding to each road image, and the road detection result comprises a road sign boundary box and confidence;

the fusion unit is configured to perform image fusion on the road images according to the confidence coefficient corresponding to each road image to obtain a fused road image;

an extraction unit configured to input the fused road image into a feature extraction network including a plurality of feature extraction layers for outputting a plurality of feature maps of different sizes, and to take a feature map of a largest size among the plurality of feature maps of different sizes as a first feature map, a feature map of a smallest size as a second feature map, and a feature map of a size smaller than the first feature map and larger than the second feature map as a third feature map;

the compression unit is configured to perform channel compression on the first feature map and the second feature map respectively to obtain a first compression feature map and a second compression feature map, and the channel numbers of the first compression feature map and the second compression feature map are consistent;

an upsampling unit configured to upsample the second compressed feature map to obtain a second upsampled feature map, where the size of the second upsampled feature map is the same as that of the first compressed feature map, and perform feature fusion on the second upsampled feature map and the first compressed feature map to obtain a fused feature map;

the segmentation unit is configured to input the fusion feature map into a semantic segmentation network to obtain a segmentation feature map;

and the decoding unit is configured to splice the segmentation characteristic map and the third characteristic map and input the spliced segmentation characteristic map and the third characteristic map into a decoding network to obtain a road sign segmentation image and the category information of the road sign displayed in the road sign segmentation image.

6. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

7. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-4.