CN117115772B

CN117115772B - Image processing method, device, equipment, storage medium and program product

Info

Publication number: CN117115772B
Application number: CN202311361924.XA
Authority: CN
Inventors: 娄英欣
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-01-30
Anticipated expiration: 2043-10-20
Also published as: CN117115772A

Abstract

The embodiment of the application discloses a method, a device, equipment, a storage medium and a program product for processing images, at least relates to artificial intelligence and other technologies, improves the matching effect between images and improves the map updating capability. The method comprises the following steps: acquiring at least one image pair, each image pair comprising a first road image and a second road image, the first road image comprising at least one first identification component, the second road image comprising at least one second identification component; extracting, for each image pair, a feature vector for each first identification component and a feature vector for each second identification component; the feature vector of each first identification component and the feature vector of each second identification component are subjected to similarity distance calculation to obtain component similarity between the corresponding first identification component and the corresponding second identification component; an image matching result for the corresponding image pair is determined based on each component similarity.

Description

Image processing method, device, equipment, storage medium and program product

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a method, a device, equipment, a storage medium and a program product for image processing.

Background

In the process of collecting map road data, in order to update map information, a newly collected road image needs to be compared with an already collected historical road image, so that a changed landmark mark and the like are found, and a map is updated.

Judging whether the newly acquired road image is matched with the historical road image or not, and training a large number of labeling data pairs by using a convolutional neural network such as Resnet, inceptionV and the like in a traditional mode to obtain a deep learning classification network through training. In this way, after the newly collected road image and the historical road image are aligned, advanced semantic feature extraction and classification are carried out on the image pair aligned based on the deep learning classification network so as to obtain a final image matching result, and whether the newly collected road image and the historical road image are matched or not can be obtained through the image matching result.

However, in the conventional matching method, the image matching result cannot be accurately determined by performing image alignment processing only and performing advanced semantic feature extraction on the image pairs aligned by the deep learning classification network, and finer image information in the image pairs is not understood more finely, so that the problem of map updating errors caused by new and old road image matching errors is caused due to poor matching effect.

Disclosure of Invention

The embodiment of the application provides an image processing method, device, equipment, storage medium and program product, which can be used for accurately determining an image matching result of an image pair and improving the matching effect, so that the problem of map updating errors caused by the occurrence of new and old road image matching errors is avoided.

In a first aspect, embodiments of the present application provide a method for image processing. The method comprises the following steps: acquiring at least one image pair, wherein each image pair comprises a first road image and a second road image, the acquisition time of the second road image is earlier than that of the first road image, the first road image comprises at least one first identification component, the second road image comprises at least one second identification component, each first identification component is used for indicating landmark identifications displayed in the first road image, and each second identification component is used for indicating landmark identifications displayed in the second road image; extracting, for each of the image pairs, a feature vector for each of the first identification means, and a feature vector for each of the second identification means; performing similarity distance calculation between the feature vector of each first identification component and the feature vector of each second identification component to obtain component similarity between the corresponding first identification component and the corresponding second identification component; an image matching result corresponding to the image pair is determined based on each of the component similarities.

In a second aspect, embodiments of the present application provide an image processing apparatus. The image processing apparatus includes, but is not limited to, a terminal device, a server, and the like. The image processing apparatus includes an acquisition unit and a processing unit. The acquisition unit is used for acquiring at least one image pair, each image pair comprises a first road image and a second road image, the acquisition time of the second road image is earlier than that of the first road image, the first road image comprises at least one first identification component, the second road image comprises at least one second identification component, each first identification component is used for indicating landmark identifications displayed in the first road image, and each second identification component is used for indicating landmark identifications displayed in the second road image. And the processing unit is used for extracting the characteristic vector of each first identification component and the characteristic vector of each second identification component. The processing unit is used for calculating the similarity distance between the feature vector of each first identification component and the feature vector of each second identification component to obtain the component similarity between the corresponding first identification component and the corresponding second identification component. The processing unit is used for determining an image matching result in the corresponding image pair based on the similarity of each component.

In some alternative embodiments, the processing unit is configured to: extracting image features of the first road image based on a preset feature extraction model; and extracting the image characteristics of the first road image based on at least one preset candidate detection frame to obtain the characteristic vector of each first identification component.

In other alternative embodiments, the processing unit is configured to: extracting image features of the second road image based on a preset feature extraction model; and extracting the features of the image features of the second road image based on at least one preset candidate detection frame to obtain the feature vector of each second identification component.

In other optional embodiments, the feature vector of each of the first identification components includes one or more of a first position feature, a first rotation angle feature, and a first semantic feature corresponding to the first identification component, and the feature vector of each of the second identification components includes one or more of a second position feature, a second rotation angle feature, and a second semantic feature corresponding to the second identification component; the processing unit is used for: calculating a first similar distance between the first position feature of a first sub-component and the second position feature of a second sub-component, the first sub-component being any one of the at least one first identification component and the second sub-component being any one of the at least one second identification component; calculating a second similar distance between the first rotational angle feature of the first sub-component and the second rotational angle feature of the second sub-component; calculating a third distance between the first semantic feature of the first sub-component and the second semantic feature of the second sub-component; and carrying out weighted summation processing on one or more of the first similar distance, the second similar distance and the third similar distance to obtain the component similarity between the first sub-component and the second sub-component.

In other alternative embodiments, the processing unit is configured to: and calculating the first position feature of the first sub-component and the second position feature of the second sub-component based on a preset similarity algorithm to obtain a first similar distance between the first position feature of the first sub-component and the second position feature of the second sub-component.

In other alternative embodiments, the processing unit is configured to: calculating differences between respective elements in the first rotation angle feature in the first sub-component and elements at the same element positions in the second rotation angle feature in the second sub-component; and calculating the sum of absolute values of each difference value to obtain a second similar distance between the first rotation angle characteristic of the first sub-component and the second rotation angle characteristic of the second sub-component.

In other alternative embodiments, the processing unit is configured to: and carrying out vector inner product calculation on the first semantic features of the first sub-component and the second semantic features of the second sub-component to obtain a third similar distance between the first semantic features of the first sub-component and the second semantic features of the second sub-component.

In other alternative embodiments, the processing unit is configured to: comparing each part similarity with a first preset threshold value respectively, and determining a target identification part pair, wherein the part similarity corresponding to the target identification part pair is larger than the first preset threshold value; counting the number of the target identification component pairs, the number of the first identification components and the number of the second identification components; an image matching result corresponding to the image pair is determined based on the number of target identification component pairs, the number of first identification components, and the number of second identification components.

In other alternative embodiments, the processing unit is configured to: calculating a first value based on the number of the target identification component pairs, the number of the first identification components and the number of the second identification components, wherein the first value is used for indicating the matching degree of the identification components in the first road image and the second road image corresponding to the image pairs; and when the first value is smaller than a second preset threshold value, determining an image matching result corresponding to the image pair as a first result, wherein the first result is used for indicating that an identification component difference exists between the first road image and the second road image.

In other alternative embodiments, the processing unit is configured to: determining a maximum number from the number of first identification components and the number of second identification components; dividing the number of target identification component pairs by the maximum number to obtain the first value.

In other alternative embodiments, the processing unit is further configured to: when the first value is smaller than a second preset threshold value, determining that an image matching result corresponding to the image pair is a first result, and then determining an identification component change area in the first road image and the second road image based on the first result; and updating the second road image based on the identification component change area.

In other alternative embodiments, the acquisition unit is further configured to: at least one first road image and at least one second road image are acquired before at least one image pair is acquired. And the processing unit is used for respectively carrying out image alignment processing on each first road image and each second road image in pairs to obtain an image alignment result, wherein the image alignment result comprises at least one image pair.

A third aspect of the present embodiment provides an image processing apparatus, including: memory, input/output (I/O) interfaces, and memory. The memory is used for storing program instructions. The processor is configured to execute the program instructions in the memory to perform the method of image processing corresponding to the implementation manner of the first aspect.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform to execute a method corresponding to an embodiment of the first aspect described above.

A fifth aspect of the embodiments of the present application provides a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the above-described method for performing the embodiment of the above-described first aspect.

From the above technical solutions, the embodiments of the present application have the following advantages:

in the embodiment of the application, at least one image pair is acquired first, and each image pair comprises a first road image and a second road image, wherein the acquisition time of the second road image is earlier than that of the first road image. In addition, at least one first identification component is included in the first road image, and each first identification component is capable of indicating a landmark identification displayed in the first road image. Likewise, the second road image includes at least one second identification component, and each second identification component is used to indicate a landmark identification displayed in the second road image. And for each image pair, extracting the feature vector of each first identification component and the feature vector of each second identification component, and further carrying out similarity distance calculation between each feature vector of each first identification component and each feature vector of each second identification component to obtain the component similarity between the corresponding first identification component and the corresponding second identification component. In this way, the image matching results for the corresponding image pairs are determined based on each component similarity. By the method, the semantics of the identification components in the image pair are more finely understood by extracting the characteristic vector of each first identification component and the characteristic vector of each second identification component in the image pair, and the accurate local judgment of the aligned image pair is realized. Therefore, the component similarity between the feature vectors of the two identification components is calculated, the image matching result of the image pair is accurately determined based on the component similarity, the matching effect is improved, and the problem of map updating errors caused by the occurrence of new and old road image matching errors can be effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a system framework schematic provided by an embodiment of the present application;

FIG. 2 shows a flowchart of a method of image processing provided by an embodiment of the present application;

FIG. 3A illustrates an alternative schematic diagram of an image pair provided by an embodiment of the present application;

FIG. 3B illustrates another alternative schematic diagram of an image pair provided by an embodiment of the present application;

FIG. 4 shows a schematic flow chart of feature extraction provided by an embodiment of the present application;

fig. 5 shows a schematic flow chart of a convolutional neural network according to an embodiment of the present application;

FIG. 6 illustrates a schematic diagram of a pre-selected candidate detection box provided by an embodiment of the present application;

FIG. 7 is a schematic flow chart of identification component matching according to an embodiment of the present application;

fig. 8 is a schematic diagram showing one functional block of an image processing apparatus provided in an embodiment of the present application;

Fig. 9 shows a hardware configuration diagram of an image processing apparatus provided in an embodiment of the present application.

Description of the embodiments

It will be appreciated that in the specific embodiments of the present application, related data such as user information is referred to, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of being practiced otherwise than as specifically illustrated and described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With research and advancement of artificial intelligence (artificial intelligence, AI) technology, research and application of artificial intelligence technology has been developed in various fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will find application in more fields and will have increasingly important value.

The image processing technology based on artificial intelligence is a technology for positioning and identifying objects in images, and is one of basic tasks in computer vision in the field of artificial intelligence. Object detection is widely used in various image processing apparatuses, and can improve recognition efficiency of objects in an image.

With the continuous development of technology, maps have become an indispensable interest in people's daily life. Map becomes an indispensable tool whether going out for travel, navigating for travel or searching for geographical information. Through the map, practical information such as journey, time, traffic mode and the like can be provided for people, and travel can be planned better for people. The geographic information system can store, inquire, analyze and display information of measurement data, topographic data and other geographic elements such as roads, buildings and the like in the real world, so that a map is generated. Therefore, when the information such as the geographic elements in the real world is changed once, the corresponding map needs to be updated timely, so that the problems that the normal trip planning of people is influenced due to the fact that the map is not updated timely are avoided.

In determining whether or not information such as a geographical element such as a landmark has been changed, a newly acquired road image is generally matched with a history road image. However, in the conventional matching method, the image matching result cannot be determined efficiently and accurately only through image alignment processing and advanced semantic feature extraction is performed on the aligned image pairs depending on a deep learning classification network, more detailed image information in the image pairs is not understood more precisely, and the matching effect is poor, so that the problem of map updating errors caused by new and old road image matching errors is caused.

Accordingly, in order to solve the technical problems described above, embodiments of the present application provide a method of image processing. The image processing method provided by the embodiment of the application is realized based on artificial intelligence. Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice technology, a natural language processing technology, machine learning/deep learning and other directions.

In the embodiments of the present application, the artificial intelligence techniques mainly include the above-mentioned directions of computer vision techniques, machine learning, and the like. For example, it may involve image processing, image recognition, etc. in Computer Vision (CV); deep learning (ML) in machine learning may also be involved, including artificial neural networks (artificial neural network) and the like; active learning (active learning) in machine learning (teaching learning) may also be involved.

The image processing method provided by the application can be applied to image processing equipment with data processing capability, such as terminal equipment, a server, a question-answering robot and the like. The terminal device may include, but is not limited to, a smart phone, a desktop computer, a notebook computer, a tablet computer, a smart speaker, a vehicle-mounted device, a smart watch, a wearable smart device, a smart voice interaction device, a smart home appliance, an aircraft, and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, a cloud server providing cloud computing service, or the like, and the application is not particularly limited. In addition, the terminal device and the server may be directly connected or indirectly connected by wired communication or wireless communication, and the present application is not particularly limited.

The above mentioned image processing device may be provided with the capability to implement the above mentioned computer vision techniques. The mentioned computer vision technology is a science for researching how to make the machine "look at", and further means that the camera and computer are used to replace human eyes to perform machine vision such as object recognition, track tracing and measurement, and further perform graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to the instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others. In the embodiment of the application, the image processing device can perform processing such as feature map extraction and the like on the image to be marked through the computer vision technology.

In addition, the image processing apparatus may also have machine learning capabilities. Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning, deep learning, and teaching learning typically involve neural networks. The artificial intelligent model is adopted in the image processing method provided by the embodiment of the application, which mainly relates to application to a neural network, and the processing of extracting characteristics and the like of an identification component in a road image is realized through the neural network.

The method of image processing provided in the present application is applicable to the system frame schematic shown in fig. 1. As shown in fig. 1, the system frame includes at least a photographing apparatus and an image processing apparatus. The photographing device can photograph roads in the driving direction of the vehicle, so that a first image sequence of the current acquisition time is obtained, and the first image sequence comprises at least one first road image. It should be noted that, in the first road image, at least one first identification component is included, where each first identification component is configured to indicate a landmark identifier displayed in the first road image, for example, but not limited to, a road name, a road indication arrow, a lane line, and the like, and in this embodiment of the present application, the disclosure is not limited thereto.

After the photographing device collects the first image sequence, the photographing device may send the first image sequence or directly send each first road image to the image processing device. In this way, the image processing apparatus performs an alignment process in conjunction with the second road image having been acquired after the first road image has been acquired, thereby obtaining at least one image pair. The image processing device may also acquire a second image sequence from the photographing device, and at least one second road image from the second image sequence.

It should be noted that the first road image and the second road image are included in each image pair, and the acquisition time of the second road image is earlier than the acquisition time of the first road image. It should be noted that the second road image includes at least one second identification component, and each of the first identification components is configured to indicate a landmark identifier displayed in the first road image, and each of the second identification components is configured to indicate a landmark identifier displayed in the second road image, for example, including, but not limited to, a road name, a road indication arrow, a lane line, and so on, which are not limited in the embodiment of the present application.

In this way, the image processing device extracts the feature vector of each first identification component and the feature vector of each second identification component correspondingly for each image pair, and further calculates the similarity distance between the feature vector of each first identification component and the feature vector of each second identification component to obtain the component similarity between the corresponding first identification component and the corresponding second identification component. And finally, determining an image matching result of the corresponding image pair based on the similarity of each component, so that the image matching result determines whether the first road image and the second road image need to be subjected to image difference processing. Optionally, when it is known that there is an identification component change area between the first road image and the second road image based on the image matching result, the second road image is updated based on the identification component change area, and the updated second road image is further output, so that map information is updated based on the updated second road image. Through the mode, the semantics of the identification components in the image pairs can be more finely understood, the matched image pairs can be accurately and locally judged, the matching effect is improved, and the problem of map updating errors caused by the fact that new and old road image matching errors occur can be effectively solved.

It should be noted that the photographing apparatus mentioned in fig. 1 may be an independent photographing device having an image capturing function and a data transmission function; alternatively, the photographing device mentioned in fig. 1 may be a photographing function module disposed in a vehicle-mounted device or other terminal devices, which is not limited in the embodiment of the present application. Fig. 1 is only an example of a photographing apparatus having an image capturing function and a data transmission function.

In addition, the image processing method provided by the embodiment of the application can be applied to application scenes such as artificial intelligence, cloud technology, internet of vehicles, intelligent traffic and the like, and the embodiment of the application is not limited.

A method for image processing according to an embodiment of the present application is described below with reference to the accompanying drawings. Fig. 2 shows a flowchart of a method for image processing according to an embodiment of the present application. As shown in fig. 2, the image processing method may include the steps of:

201. at least one image pair is obtained, each image pair comprises a first road image and a second road image, the acquisition time of the second road image is earlier than that of the first road image, the first road image comprises at least one first identification component, the second road image comprises at least one second identification component, each first identification component is used for indicating landmark identifications displayed in the first road image, and each second identification component is used for indicating landmark identifications displayed in the second road image.

In this example, at least one first road image at the current acquisition time can be acquired by photographing a road ahead of the vehicle traveling by the photographing apparatus to form a first image sequence. Subsequently, the photographing apparatus transmits the first image sequence to the image processing apparatus. In addition, before the current acquisition time, the photographing device acquires at least one second road image as well, forms a second image sequence, and transmits the second image sequence to the image processing device. It should be noted that the acquisition time of each second road image is earlier than the acquisition time of each first road image. For example, a vehicle may travel on road a at 2023, 8, 11, and at this time, the road a may be photographed by a photographing device disposed in the vehicle, so as to acquire at least one second road image. Similarly, a vehicle is driven on the road a on the 10 th month 11 th year 2023, and at this time, the road a may be photographed by a photographing device disposed in the vehicle, so as to acquire at least one first road image.

In this way, after the image processing device receives the first image sequence and the second image sequence sent by the photographing device, the image processing device can respectively perform image alignment processing on each first road image in the first image sequence and each second road image in the second image sequence, so as to obtain an image alignment result. As an illustrative description, taking any one of the first road images (e.g., the first image) and any one of the second road images (e.g., the second image) as an example, the image processing apparatus may first acquire a first motion offset in the first motion direction and a second motion offset in the second motion direction of the first image with respect to the second image. The first and second directions of motion are not depicted on the same straight line. In this way, the image processing device determines the alignment detection result of the first image according to the first motion offset and the second motion offset, and then obtains the image alignment result.

In the image alignment result, at least one image pair is included, and each image pair includes a first road image and a second road image. Thereby, the image processing apparatus can acquire at least one image pair from the image alignment result.

For example, fig. 3A shows an alternative schematic diagram of an image pair provided by an embodiment of the present application. As shown in fig. 3A, in any one of the image pairs, a first road image and a second road image are included. For example, in the first road image acquired at 2023, 10 and 11, at least one first identification component, such as the "southwest new road" identification component, the "central first street" identification component, the "wide blue road" identification component, the "college north road" identification component, etc., shown in fig. 3A, is included. The landmark mark displayed in the first road image can be known through each first identification component, for example, the landmark mark of the southwest new road displayed in the first road image can be known through the southwest new road identification component, and the trend of the southwest new road under the current road can be further clarified.

Similarly, as can be seen from fig. 3A, the second road image acquired at the year 2023, 8 and 11 also includes at least one second identification component, for example, a "southwest new road" identification component, a "Jiangda road" identification component, a "first major road" identification component, a "blue road" identification component, a "college north road" identification component, and the like, which is not limited in the present application. For each second identification component, the landmark identification displayed in the second road image can be known, for example, the landmark identification of the "first lane" displayed in the second road image can be known through the "first lane" identification component, and the trend of the "first lane" under the current road can be further clarified.

Still alternatively, fig. 3B shows another alternative schematic diagram of an image pair provided by an embodiment of the present application. In the image pair shown in fig. 3B, the acquired second road image includes three second identification components, namely, a "second street" identification component, a "Pu Minlu" identification component, and a "new An Lu" identification component. The acquired first road image comprises three first identification components, namely a second street identification component, a Pu Helu identification component and a new An Lu identification component.

It should be noted that, the first road image may include, but is not limited to, a traffic sign, a building, etc., and the embodiment of the present application is not limited thereto. In addition, the described second road image may also include, but is not limited to, traffic signs, buildings, etc., and is not specifically limited to. In addition, the "southwest new road" sign component and the like shown in fig. 3A and the "second street" sign component and the like shown in fig. 3B may be other sign components in practical application, and the embodiment of the present application is not limited thereto.

202. For each image pair, a feature vector for each first identification component is extracted, and a feature vector for each second identification component is extracted.

In this example, although the first road image and the second road image in the above-described image pair have been subjected to image alignment processing, i.e., road images at different acquisition times in the same scene are preliminarily adapted. However, even if the road images of the same scene are matched, it cannot be directly determined that the road images acquired at two different acquisition times describe the same road signboard. For example, the image pairs shown in fig. 3A and 3B are the same scene, but the content of the identification member in the road image can be changed. For example, the image pair shown in fig. 3A has a problem of a new number of identification members such as road names (for example, "central street"), and the image pair shown in fig. 3B has a problem of a changed number of identification members such as road names (for example, "Pu Minlu" and "Pu Helu").

Therefore, if it is to be determined whether the first road image and the second road image in the image pair are the same physical point or describe the same road sign, the image processing device needs to perform more detailed contrast determination processing on the road images in the image pair. For example, the image processing apparatus may extract, after obtaining each image pair, a feature vector of each first identification component in the first road image in the image pair and a feature vector of each second identification component in the second road image, taking any one of the image pairs as an example.

Illustratively, fig. 4 shows a schematic flow chart of feature extraction provided in an embodiment of the present application. As shown in fig. 4, after obtaining the image pair, the image processing apparatus may use the first road image and the second road image in the image pair as inputs of a convolutional neural network, respectively, so as to extract image features of the corresponding road image through the convolutional neural network.

In particular, with respect to the convolutional neural network illustrated in fig. 4 described above, it can be understood with reference to the network structure schematic illustrated in fig. 5. As shown in fig. 5, the convolutional neural network includes at least a feature extraction layer and a candidate frame selection layer. Also, a preset feature extraction model is included in the feature extraction layer, and is composed of a convolution layer (con), a normalization layer (batch normalization, BN), and an activation layer (relu).

After each image pair is obtained, the image processing device may input the first road image into a convolution layer in a preset feature extraction model, so as to extract a corresponding feature vector through the convolution layer. Further, the feature vector output by the convolution layer is used as the input of the normalization layer, so that the feature vector output by the convolution layer is normalized according to normal distribution through the normalization layer, noise features in the feature vector are filtered, and the filtered feature vector is obtained. And finally, taking the filtered feature vector output by the normalization layer as the input of the activation layer, so as to finish nonlinear mapping processing on the filtered feature vector through the activation layer, thereby extracting and obtaining the image features of the first road image.

After the image processing device extracts the image features of the first road image, the image processing device may further perform feature extraction on the image features of the first road image based on at least one preset candidate detection frame through a model such as a region extraction network (region proposal network, RPN) to obtain feature vectors of each first identification component. The feature vector of each first identification component can characterize the corresponding first identification component.

For example, fig. 6 shows a schematic diagram of a pre-selected candidate detection box provided by an embodiment of the present application. As shown in fig. 6, each feature point in the image features of the first road image may be taken as a center point, and aspect ratios of 1: 1. 2:1 and 1:2, and 9 preset candidate detection frames with characteristic scales of 1, 2 and 3 respectively. Thus, the image processing device performs feature extraction on the image features of the first road image based on the 9 preset candidate detection frames, and further extracts feature vectors of each first identification component. For example, taking the first road image shown in fig. 3A as an example, feature vectors of 4 first identification components, i.e., the "southwest new road" identification component, the "central street" identification component, the "wide road" identification component, and the "college north road" identification component, may be extracted and represented by F11 to F14, for example.

It should be noted that the feature vector of each first identification component may include, but is not limited to, one or more of the first position feature, the first rotation angle feature, and the first semantic feature of the corresponding first identification component, which is not limited in the embodiment of the present application, and may be specifically understood with reference to the foregoing description of fig. 4. The first mentioned position feature can be used to clarify the coordinate position of the first identification means in the first road image. The first rotation angle feature is used to indicate a rotation angle attribute of the corresponding first identification component in the first road image. The first semantic features can indicate semantic information of the corresponding first identification component.

Likewise, with respect to how to extract the feature vector of each second identification means in the second road image, the image processing apparatus may also input the second road image into the convolution layer of the preset feature extraction model shown in fig. 5 to extract the corresponding feature vector through the convolution layer. Further, the feature vector output by the convolution layer is used as the input of the normalization layer, so that the feature vector output by the convolution layer is normalized according to normal distribution through the normalization layer, noise features in the feature vector are filtered, and the filtered feature vector is obtained. And finally, taking the output filtered feature vector as the input of an activation layer to finish nonlinear mapping processing of the normalized layer feature vector through the activation layer, thereby extracting and obtaining the image features of the second road image. In this way, after the image processing device extracts the image features of the second road image, the image processing device may further perform feature extraction on the image features of the second road image based on at least one preset candidate detection frame through the network model such as RPN shown in fig. 5, to obtain a feature vector of each second identification component. It should be noted that, the preset candidate detection frame described herein may also be understood with reference to the foregoing content shown in fig. 5, which is not described herein.

For example, taking the second road image shown in fig. 3A as an example, feature vectors of 5 second identification components, i.e., the "southwest new road" identification component, the "Jiangroad" identification component, the "first major road" identification component, the "blue road" identification component, and the "college north road" identification component, may be extracted and represented by, for example, F21 to F25.

It should be noted that the feature vector of each second identification component may include, but is not limited to, one or more of the second position feature, the second rotation angle feature, and the second semantic feature of the corresponding second identification component, which is not limited in the embodiment of the present application, and may be specifically understood with reference to the foregoing description of fig. 4. The mentioned second location feature can be used to clarify the coordinate location of the second identification means in the second road image. The second rotation angle feature can be used to indicate a rotation angle attribute of the corresponding second identification component in the second road image. The second semantic features can indicate semantic information of a corresponding second identification component.

203. And calculating the similarity distance between the feature vector of each first identification component and the feature vector of each second identification component to obtain the component similarity between the corresponding first identification component and the corresponding second identification component.

In this example, in the process of performing fine comparison, for each image pair, after extracting the feature vector of each first identification component and the feature vector of each second identification component, the image processing apparatus may perform similarity distance calculation between each feature vector of each first identification component and each feature vector of each second identification component, so as to obtain component similarity between the corresponding first identification component and the corresponding second identification component.

For example, fig. 7 shows a schematic flow chart of identification component matching provided in an embodiment of the present application. As shown in fig. 7, on the basis of the image pair shown in fig. 3A, the feature vectors of the 4 first identification components, i.e., the "southwest new road" identification component, the "central one street" identification component, the "wide blue road" identification component, and the "college north road" identification component, in the first road image can be extracted in the manner of the foregoing step 202, and are F11, F12, F13, and F14. Similarly, feature vectors of 5 second identification components, namely a southwest new road identification component, a Jiangdao identification component, a first road identification component, a Guanglan road identification component and a college North road identification component, in the second road image are extracted and obtained as F21, F22, F23, F24 and F25.

Further, the image processing apparatus may calculate the similarity distances between F11 and F21, F22, F23, F24, F25, respectively, for example, the calculated similarity distance is L _F11F21 、L _F11F22 、L _F11F23 、L _F11F24 、L _F11F25 . By these 5 similar distances L _F11F21 、L _F11F22 、L _F11F23 、L _F11F24 、L _F11F25 The similarity of the parts between the southwest new road identification part in the first road image and the southwest new road identification part, the Jiangda road identification part, the first major road identification part, the Guanglan road identification part and the North road identification part in the second road image can be reflected. For example, by the similar distance L _F11F21 The component similarity between the southwest new road identification component in the first road image and the southwest new road identification component in the second road image can be reflected.

Similarly, the image processing device calculates similar distances between F12 and F21, F22, F23, F24 and F25 respectively according to similar processing ideas, thereby determining the similarity of the parts between the central first street marking part in the first road image and the southwest new road marking part, the Jiangda road marking part, the first Daidan road marking part, the Guanglan road marking part and the North road marking part in the second road image. Similarly, the image processing apparatus calculates similar distances between F13 and F21, F22, F23, F24, and F25, respectively, according to similar processing ideas, thereby determining the similarity of the parts between the "wide blue road" identification part in the first road image and the "southwest new road" identification part, the "Jiangda road" identification part, the "first major road" identification part, the "wide blue road" identification part, and the "college north road" identification part in the second road image, respectively. Similarly, the image processing device calculates similar distances between F14 and F21, F22, F23, F24 and F25 respectively according to similar processing ideas, thereby determining the similarity of the parts between the north road identification part of the college in the first road image and the south-west new road identification part, the great road identification part, the first great road identification part, the blue road identification part, the north road identification part of the college in the second road image.

As can be seen from the description of the foregoing step 202, the feature vector of each first identification component includes one or more of a first position feature, a first rotation angle feature, and a first semantic feature of the corresponding first identification component; and the feature vector of each second identification component includes one or more of a second position feature, a second rotation angle feature, and a second semantic feature of the corresponding second identification component. Taking any one of the at least one first identification component (i.e. the first sub-component) and any one of the at least one second identification component (i.e. the second sub-component) as an example, the process for how to calculate the component similarity between the first identification component and the second identification component can be implemented in the following way, namely:

the image processing device may calculate a first similar distance between the first positional characteristic of the first sub-component and the second positional characteristic of the second sub-component. As an exemplary illustration, the first position feature of the first pair of first sub-components and the second position feature of the second sub-components may be calculated based on a predetermined similarity algorithm, thereby calculating a first similarity distance between the first position feature of the first sub-components and the second position feature of the second sub-components. The described preset similarity algorithm includes, but is not limited to, cosine similarity algorithm, euclidean distance algorithm, etc., and the embodiment of the present application is not limited thereto.

Likewise, the image processing apparatus may also calculate a second similar distance between the first rotation angle feature of the first sub-component and the second rotation angle feature of the second sub-component. As an illustrative description, the second similar distance between the first rotation angle feature of the first sub-component and the second rotation angle feature of the second sub-component may be calculated by means of a 1-norm algorithm. Specifically, the image processing apparatus first calculates differences between the respective elements in the first rotation angle feature in the first sub-component and the elements at the same element positions in the second rotation angle feature in the second sub-component. Further, the image processing apparatus calculates a sum of absolute values of each of the differences to calculate a second similar distance between the first rotation angle feature of the first sub-component and the second rotation angle feature of the second sub-component.

Likewise, the image processing device may also calculate a third phase distance between the first semantic feature of the first sub-component and the second semantic feature of the second sub-component. As an exemplary description, the image processing apparatus may perform a vector inner product calculation on the first semantic feature of the first sub-component and the second semantic feature of the second sub-component, thereby calculating a third similar distance between the first semantic feature of the first sub-component and the second semantic feature of the second sub-component.

Thus, after calculating the first similar distance, the second similar distance and the third similar distance, the image processing apparatus performs weighted summation processing on one or more of the first similar distance, the second similar distance and the third similar distance to obtain the component similarity between the first sub-component and the second sub-component. For example, taking the first similar distance, the second similar distance, and the third similar distance as examples, the calculated component similarity between the first sub-component and the second sub-component may be expressed as:

s=w ₁ ·||（x _i ,y _i ），（x _j ,y _j ）|| ₂ +w ₂ ·|φ _i -φ _j |+w ₃ ·（F _i ⊕F _j ）

wherein s represents the similarity of the parts between the first and second sub-parts, w ₁ Weights representing the first similarity distance, (x) _i ,y _i ) Representing a first positional characteristic of the first sub-component, (x) _j ,y _j ) Representing a first positional characteristic of the second sub-component, | (x) _i ,y _i ），（x _j ,y _j ）|| ₂ Represents a first similar distance, phi _i Representing a first angular rotation characteristic, phi, of the first sub-assembly _j Representing a second rotational angle characteristic of the second sub-assembly, |phi _i -φ _j I represents a second similar distance, F _i Representing a first semantic feature of a first sub-component, F _j Representing a second semantic feature of a second sub-component, (F) _i ⊕F _j ) Representing a third similar distance, # represents the vector inner product, w ₂ Weights representing second similarity distance, w ₃ A weight representing a third similarity distance, i representing any one of the at least one first identification means (i.e. the aforementioned first sub-means) and j representing any one of the at least one second identification means (i.e. the aforementioned second sub-means).

Incidentally, w mentioned above ₁ 、w ₂ 、w ₃ The value of (2) can be selected by definition according to processing requirements, and only w is needed ₁ 、w ₂ 、w ₃ The sum of the values of (2) is 1.

For example, taking the feature vectors F11 and F22 (i.e. i=1, j=2) shown in fig. 7 as an example, the first position feature, the first rotation angle feature, and the first semantic feature of the "southwest new road" identifying component can be obtained from the feature vector F11 as (x ₁ ,y ₁ )、φ ₁ 、F ₁ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a second position feature, a second rotation angle feature and a second semantic feature of the 'Jiangzhu' identification component from the feature vector F22 as (x) ₂ ,y ₂ )、φ ₂ 、F ₂ . Let w be ₁ 、w ₂ 、w ₃ The values of (2) are 0.5, 0.2 and 0.3 respectively, and the similarity distance between F11 and F22 can be calculated to be 0.5 ·| (x) according to the formula ₁ ,y ₁ ），（x ₂ ,y ₂ ）|| ₂ +0.2·|φ ₁ -φ ₂ |+0.3·（F ₁ ⊕F ₂ ) Namely, the similarity of the parts between the 'southwest new road' identification part and the 'Jiangroad' identification part is 0.5 | (x) ₁ ,y ₁ ），（x ₂ ,y ₂ ）|| ₂ +0.2·|φ ₁ -φ ₂ |+0.3·（F ₁ ⊕F ₂ ) For example 0.3.

Based on similar ideas, the part similarity between every two identification parts between every first identification part and every second identification part in fig. 7 can be calculated. As a schematic description, it can be understood with reference to the following table 1, namely:

TABLE 1

As can be seen from table 1, the similarity between the "southwest new road" identification component in the first road image and the 5 second identification components of the "southwest new road" identification component, the "Jiangda road" identification component, the "first major road" identification component, the "Guanglan road" identification component, and the "academy north road" identification component in the second road image is 1, 0.3, 0.2, and 0.4, respectively. Similarly, the part similarity between the central one-street identification part in the first road image and the 5 second identification parts, namely the southwest new road identification part, the Jiangda identification part, the first great road identification part, the Guanglan road identification part and the college North road identification part in the second road image is 0.6, 0.4, 0.3, 0.1 and 0.2 respectively. Similarly, the part similarity between the "wide blue road" identification part in the first road image and the 5 second identification parts, namely the "southwest new road" identification part, the "Jiangda road" identification part, the "first large road" identification part, the "wide blue road" identification part and the "college north road" identification part, in the second road image is 0.3, 0.6, 0.3, 0.7 and 0.4 respectively. Similarly, the part similarity between the north road identification part of the college in the first road image and the 5 second identification parts of the south-west new road identification part, the Jiangda identification part, the first great road identification part, the Guanglan road identification part and the north road identification part of the college in the second road image is 0.4, 0.1, 0.2, 0.5 and 1 respectively.

204. An image matching result for the corresponding image pair is determined based on each component similarity.

In this example, the image processing apparatus, after determining the component similarity of each two identification components in each image pair, may determine the image matching result of the corresponding image pair based on each component similarity. The image matching result can be used for judging whether the first road image and the second road image in the corresponding image pair are the same scene image or not, so that judgment conditions can be provided for judging whether the image difference is needed or not in the follow-up process.

As an exemplary description, in determining the image matching result of a corresponding image pair based on each component similarity, one can understand with reference to the following manner, namely:

first, the image processing apparatus may compare each component similarity with a first preset threshold value, respectively, and then determine an identification component pair when the component similarity is greater than the first preset threshold value as a target identification component pair. The described pair of target identification means comprises a first identification means and a second identification means.

For example, if the first preset threshold is assumed to be 0.6, by comparing the similarity of each component in table 1 with the first preset threshold of 0.6, it can be known that the similarity of the 3 pairs of the "southwest new road" identification component and the "southwest new road" identification component, the "wide road" identification component and the "wide blue road" identification component, and the "college north road" identification component is greater than 0.6. At this time, the 3 identification component pairs are taken as target identification component pairs.

Further, the image processing apparatus counts the number of target identification component pairs, the number of first identification components, and the number of second identification components, for example, the number of target identification component pairs is 3, the number of first identification components is 4, and the number of second identification components is 5.

In this way, the image processing apparatus determines the image matching result of the corresponding image pair based on the number of target identification member pairs, the number of first identification members, and the number of second identification members. For example, in determining the image matching result, the image processing apparatus may first calculate the first value based on the number of target identification component pairs, the number of first identification components, and the number of second identification components. As an illustrative description, the image processing apparatus may determine the maximum number from the number of first identification means and the number of second identification means in calculating the first value, and then divide the number of target identification means pairs by the maximum number to obtain the first value, that is. For example, taking the number of target identification component pairs as 3, the number of first identification components as 4, and the number of second identification components as 5 as an example, a first value of +. >. The first value is used for indicating the matching degree of the identification component in the first road image and the second road image of the corresponding image pair.

In this way, after the first value is calculated, the first value is compared with a second preset threshold value, and then a corresponding image matching result is determined based on the comparison result, and further whether the image difference processing is needed for the corresponding image pair is determined based on the image matching result.

For example, when the first value is smaller than the second preset threshold value, a larger difference between the identification components of the first road image and the second road image is indicated. For example, assuming that the second preset threshold is 0.8, the first value calculated at this time is 0.6 < 0.8, which indicates that there is a large difference between the identification components of the first road image and the second road image. In this case, it can be further determined that the image matching result of the corresponding image pair is the first result. I.e. by means of the first result, can be used to indicate that there is a difference in the identification means between the first road image and the second road image. In some optional examples, after determining that the image matching result of the corresponding image pair is the first result when the first value is less than the second preset threshold, the image processing apparatus may further determine an identification component change area in the first road image and the second road image based on the first result, and further perform image update on the second road image based on the identification component change area, so as to complete update processing on the map information.

Otherwise, when the first value is greater than or equal to the second preset threshold value, the fact that the identification component between the first road image and the second road image has no difference or the difference is negligible is indicated, and at the moment, the image matching result of the corresponding image pair is determined to be the second result. I.e. there is no or negligible difference in the identification means between the first road image and the second road image by the second result.

It should be noted that the above mentioned second preset threshold may be determined according to circumstances, and the embodiments of the present application are not limited to the above description.

In the embodiment of the application, the semantics of the identification components in the image pair are more precisely understood by extracting the feature vector of each first identification component and the feature vector of each second identification component in the image pair, so that the accurate local judgment of the aligned image pair is realized. Therefore, the component similarity between the feature vectors of the two identification components is calculated, the image matching result of the image pair is accurately determined based on the component similarity, the matching effect is improved, the problem of map updating errors caused by the occurrence of new and old road image matching errors can be effectively solved, and therefore the map updating capability is improved.

The foregoing description of the solution provided in the embodiments of the present application has been mainly presented in terms of a method. It should be understood that, in order to implement the above-described functions, hardware structures and/or software modules corresponding to the respective functions are included. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application may divide the functional modules of the apparatus according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

Next, an image processing apparatus according to an embodiment of the present application will be described in detail, and fig. 8 is a schematic diagram of an embodiment of the image processing apparatus according to an embodiment of the present application. As shown in fig. 8, the image processing apparatus may include an acquisition unit 801 and a processing unit 802.

The acquiring unit 801 is configured to acquire at least one image pair, where each image pair includes a first road image and a second road image, and the second road image has a collection time earlier than that of the first road image, and the first road image includes at least one first identification component, and the second road image includes at least one second identification component, where each first identification component is configured to indicate a landmark identifier displayed in the first road image, and each second identification component is configured to indicate a landmark identifier displayed in the second road image. It is specifically understood that the foregoing description of step 201 in fig. 2 is referred to, and details are not repeated herein.

The processing unit 802 is configured to extract a feature vector of each first identification component, and a feature vector of each second identification component. It is specifically understood that the foregoing description of step 202 in fig. 2 is referred to, and details are not repeated herein.

And a processing unit 802, configured to perform similarity distance calculation between each feature vector of each first identification component and each feature vector of each second identification component, so as to obtain a component similarity between the corresponding first identification component and the corresponding second identification component. It is specifically understood that the foregoing description of step 203 in fig. 2 is referred to, and details are not repeated herein.

A processing unit 802 for determining an image matching result in the corresponding image pair based on each component similarity. It is specifically understood that the foregoing description of step 204 in fig. 2 is referred to, and details are not repeated herein.

In some alternative embodiments, the processing unit 802 is configured to: extracting image features of the first road image based on a preset feature extraction model; and extracting the image characteristics of the first road image based on at least one preset candidate detection frame to obtain the characteristic vector of each first identification component.

In other alternative embodiments, the processing unit 802 is configured to: extracting image features of the second road image based on a preset feature extraction model; and extracting the features of the image features of the second road image based on at least one preset candidate detection frame to obtain the feature vector of each second identification component.

In other alternative embodiments, the feature vector of each first identification component includes one or more of a first location feature, a first angle of rotation feature, and a first semantic feature of the corresponding first identification component, and the feature vector of each second identification component includes one or more of a second location feature, a second angle of rotation feature, and a second semantic feature of the corresponding second identification component; the processing unit 802 is configured to: calculating a first similar distance between a first position feature of a first sub-component and a second position feature of a second sub-component, the first sub-component being any one of the at least one first identification component, the second sub-component being any one of the at least one second identification component; calculating a second similar distance between the first rotation angle feature of the first sub-component and the second rotation angle feature of the second sub-component; calculating a third phase distance between the first semantic feature of the first sub-component and the second semantic feature of the second sub-component; and carrying out weighted summation processing on one or more of the first similar distance, the second similar distance and the third similar distance to obtain the component similarity between the first sub-component and the second sub-component.

In other alternative embodiments, the processing unit 802 is configured to: and calculating the first position feature of the first sub-component and the second position feature of the second sub-component based on a preset similarity algorithm to obtain a first similar distance between the first position feature of the first sub-component and the second position feature of the second sub-component.

In other alternative embodiments, the processing unit 802 is configured to: calculating differences between the elements in the first rotation angle feature in the first sub-component and the elements at the same element positions in the second rotation angle feature in the second sub-component; the sum of the absolute values of each difference is calculated to obtain a second similar distance between the first rotation angle characteristic of the first sub-component and the second rotation angle characteristic of the second sub-component.

In other alternative embodiments, the processing unit 802 is configured to: and carrying out vector inner product calculation on the first semantic features of the first sub-component and the second semantic features of the second sub-component to obtain a third similar distance between the first semantic features of the first sub-component and the second semantic features of the second sub-component.

In other alternative embodiments, the processing unit 802 is configured to: comparing the similarity of each part with a first preset threshold value respectively, and determining a target identification part pair, wherein the similarity of the parts corresponding to the target identification part pair is larger than the first preset threshold value; counting the number of target identification component pairs, the number of first identification components and the number of second identification components; an image matching result for the corresponding image pair is determined based on the number of target identification component pairs, the number of first identification components, and the number of second identification components.

In other alternative embodiments, the processing unit 802 is configured to: calculating a first value based on the number of target identification component pairs, the number of first identification components, and the number of second identification components, the first value being used to indicate a degree of matching of the identification components in the first road image and the second road image of the corresponding image pair; and when the first value is smaller than a second preset threshold value, determining an image matching result of the corresponding image pair as a first result, wherein the first result is used for indicating that the identification component difference exists between the first road image and the second road image.

In other alternative embodiments, the processing unit 802 is configured to: determining a maximum number from the number of first identification components and the number of second identification components; the number of target identification component pairs is divided by the maximum number to obtain a first value.

In other alternative embodiments, the processing unit 802 is further configured to: when the first value is smaller than a second preset threshold value, determining that the image matching result of the corresponding image pair is a first result, and determining an identification component change area in the first road image and the second road image based on the first result; the second road image is image-updated based on the identification component change region.

In other alternative embodiments, the obtaining unit 801 is further configured to: at least one first road image and at least one second road image are acquired before at least one image pair is acquired. And a processing unit 802, configured to perform image alignment processing on each first road image and each second road image respectively, so as to obtain an image alignment result, where the image alignment result includes at least one image pair.

The image processing apparatus in the embodiment of the present application is described above from the viewpoint of a modularized functional entity, and the image processing device in the embodiment of the present application is described below from the viewpoint of hardware processing. Fig. 9 is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present application. The image processing apparatus may be relatively different in configuration or performance, including but not limited to the aforementioned image processing devices and the like. The image processing device may include at least one processor 901, communication lines 907, memory 903, and at least one communication interface 904.

The processor 901 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (server IC), or one or more integrated circuits for controlling the execution of programs in accordance with aspects of the present application.

Communication line 907 may include a pathway to transfer information between the aforementioned components.

The communication interface 904, uses any transceiver-like device for communicating with other devices or communication networks, such as ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.

The memory 903 may be a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that may store information and instructions, and the memory may be stand-alone and coupled to the processor via a communication line 907. The memory may also be integrated with the processor.

The memory 903 is used for storing computer-executable instructions for executing the embodiments of the present application, and is controlled by the processor 901 to execute the instructions. The processor 901 is configured to execute computer-executable instructions stored in the memory 903, thereby implementing the method for image processing provided in the above-described embodiments of the present application.

Alternatively, the computer-executable instructions in the embodiments of the present application may be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In a specific implementation, as an embodiment, the image processing apparatus may include a plurality of processors, such as the processor 901 and the processor 902 in fig. 9. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In a specific implementation, as an embodiment, the image processing device may further include an output device 905 and an input device 906. The output device 905 communicates with the processor 901 and may display information in a variety of ways. The input device 906, in communication with the processor 901, may receive input of a target object in a variety of ways. For example, the input device 906 may be a mouse, a touch screen device, a sensing device, or the like.

The image processing apparatus described above may be a general-purpose device or a special-purpose device. In a specific implementation, the image processing apparatus may be a server, a terminal, or the like, or a device having a similar structure in fig. 9. The embodiment of the present application does not limit the type of the image processing apparatus.

Note that the processor 901 in fig. 9 may cause the image processing apparatus to execute the method in the method embodiment corresponding to fig. 2 by calling the computer-executable instructions stored in the memory 903.

In particular, the functions/implementation of the processing unit 802 in fig. 8 may be implemented by the processor 901 in fig. 9 invoking computer executable instructions stored in the memory 903. The function/implementation procedure of the acquisition unit 801 in fig. 8 can be implemented by the communication interface 904 in fig. 9.

The present application also provides a computer storage medium storing a computer program for electronic data exchange, the computer program causing a computer to execute some or all of the steps of any one of the image processing methods described in the above method embodiments.

The present application also provides a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the image processing methods described in the method embodiments above.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above-described embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof, and when implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer-executable instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be stored by a computer or data storage devices such as servers, data centers, etc. that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., SSD)), or the like.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method of image processing, comprising:

acquiring at least one image pair, wherein each image pair comprises a first road image and a second road image, the acquisition time of the second road image is earlier than that of the first road image, the first road image comprises at least one first identification component in a traffic sign, the second road image comprises at least one second identification component in the traffic sign, each first identification component is used for indicating a landmark identifier displayed in the first road image, and each second identification component is used for indicating a landmark identifier displayed in the second road image;

Extracting, for each of the image pairs, a feature vector of each of the first identification components, and a feature vector of each of the second identification components, wherein the feature vector of the first identification component includes one or more of a first position feature, a first rotation angle feature, and a first semantic feature corresponding to the first identification component, the feature vector of the second identification component includes one or more of a second position feature, a second rotation angle feature, and a second semantic feature corresponding to the second identification component, the position feature is used to explicitly identify a coordinate position of the component in the road image, the rotation angle feature is used to explicitly identify a rotation angle attribute of the component in the road image, and the semantic feature indicates semantic information of the corresponding identification component;

performing similarity distance calculation between the feature vector of each first identification component and the feature vector of each second identification component to obtain component similarity between the corresponding first identification component and the corresponding second identification component;

comparing each part similarity with a first preset threshold value respectively, and determining a target identification part pair, wherein the part similarity corresponding to the target identification part pair is larger than the first preset threshold value;

Counting the number of the target identification component pairs, the number of the first identification components and the number of the second identification components;

determining an image matching result corresponding to the image pair based on the number of target identification component pairs, the number of first identification components, and the number of second identification components, including: determining the maximum number from the number of the first identification components and the number of the second identification components, dividing the number of the target identification component pairs by the maximum number to obtain a first value, wherein the first value is used for indicating the matching degree of the identification components in the first road image and the second road image of the corresponding image pair, comparing the first value with a second preset threshold value, and determining a corresponding image matching result based on the comparison result, wherein the image matching result is used for judging whether the first road image and the second road image of the corresponding image pair are the same scene image.

2. The method of claim 1, wherein said extracting feature vectors for each of said first identification components comprises:

extracting image features of the first road image based on a preset feature extraction model;

And extracting the image characteristics of the first road image based on at least one preset candidate detection frame to obtain the characteristic vector of each first identification component.

3. The method of claim 1, wherein extracting the feature vector for each of the second identification components comprises:

extracting image features of the second road image based on a preset feature extraction model;

and extracting the features of the image features of the second road image based on at least one preset candidate detection frame to obtain the feature vector of each second identification component.

4. A method according to any one of claims 1 to 3, wherein the calculating the similarity distance between the feature vector of each first identification component and the feature vector of each second identification component to obtain the component similarity between the corresponding first identification component and the corresponding second identification component includes:

calculating a first similar distance between the first position feature of a first sub-component and the second position feature of a second sub-component, the first sub-component being any one of the at least one first identification component and the second sub-component being any one of the at least one second identification component;

Calculating a second similar distance between the first rotational angle feature of the first sub-component and the second rotational angle feature of the second sub-component;

calculating a third distance between the first semantic feature of the first sub-component and the second semantic feature of the second sub-component;

and carrying out weighted summation processing on one or more of the first similar distance, the second similar distance and the third similar distance to obtain the component similarity between the first sub-component and the second sub-component.

5. The method of claim 4, wherein the calculating a first similar distance between the first location feature of a first sub-component and the second location feature of a second sub-component comprises:

and calculating the first position feature of the first sub-component and the second position feature of the second sub-component based on a preset similarity algorithm to obtain a first similar distance between the first position feature of the first sub-component and the second position feature of the second sub-component.

6. The method of claim 4, wherein the calculating a second similar distance between the first rotational angle feature of the first sub-component and the second rotational angle feature of the second sub-component comprises:

Calculating differences between respective elements in the first rotation angle feature in the first sub-component and elements at the same element positions in the second rotation angle feature in the second sub-component;

and calculating the sum of absolute values of each difference value to obtain a second similar distance between the first rotation angle characteristic of the first sub-component and the second rotation angle characteristic of the second sub-component.

7. The method of claim 4, wherein said calculating a third similar distance between said first semantic feature of said first sub-component and said second semantic feature of said second sub-component comprises:

and carrying out vector inner product calculation on the first semantic features of the first sub-component and the second semantic features of the second sub-component to obtain a third similar distance between the first semantic features of the first sub-component and the second semantic features of the second sub-component.

8. The method of claim 1, wherein comparing the first value to a second preset threshold value, and determining a corresponding image matching result based on the comparison result, comprises:

And when the first value is smaller than a second preset threshold value, determining an image matching result corresponding to the image pair as a first result, wherein the first result is used for indicating that an identification component difference exists between the first road image and the second road image.

9. The method of claim 8, wherein after determining that the image matching result corresponding to the image pair is the first result when the first value is less than a second preset threshold, the method further comprises:

determining an identification component change region in the first road image and the second road image based on the first result;

and updating the second road image based on the identification component change area.

10. A method according to any one of claims 1 to 3, wherein prior to acquiring at least one image pair, the method further comprises:

acquiring at least one first road image and at least one second road image;

and respectively carrying out image alignment processing on each first road image and each second road image in pairs to obtain an image alignment result, wherein the image alignment result comprises at least one image pair.

11. An image processing apparatus, comprising:

an acquisition unit configured to acquire at least one image pair, each of the image pairs including a first road image and a second road image, the second road image having an acquisition time earlier than that of the first road image, the first road image including at least one first identification component in a traffic sign, the second road image including at least one second identification component in a traffic sign, each of the first identification components being configured to indicate landmark identifications displayed in the first road image, each of the second identification components being configured to indicate landmark identifications displayed in the second road image;

the processing unit is used for extracting the feature vector of each first identification component and the feature vector of each second identification component, wherein the feature vector of the first identification component comprises one or more of a first position feature, a first rotation angle feature and a first semantic feature corresponding to the first identification component, the feature vector of the second identification component comprises one or more of a second position feature, a second rotation angle feature and a second semantic feature corresponding to the second identification component, the position feature is used for clearly identifying the coordinate position of the component in the road image, the rotation angle feature is used for clearly identifying the rotation angle attribute of the component in the road image, and the semantic feature indicates the semantic information of the corresponding identification component;

The processing unit is used for calculating the similarity distance between the feature vector of each first identification component and the feature vector of each second identification component to obtain the component similarity between the corresponding first identification component and the corresponding second identification component;

the processing unit is used for comparing the similarity of each part with a first preset threshold value respectively to determine a target identification part pair, and the similarity of the parts corresponding to the target identification part pair is larger than the first preset threshold value; counting the number of the target identification component pairs, the number of the first identification components and the number of the second identification components; determining an image matching result corresponding to the image pair based on the number of target identification component pairs, the number of first identification components, and the number of second identification components, including: determining the maximum number from the number of the first identification components and the number of the second identification components, dividing the number of the target identification component pairs by the maximum number to obtain a first value, wherein the first value is used for indicating the matching degree of the identification components in the first road image and the second road image of the corresponding image pair, comparing the first value with a second preset threshold value, and determining a corresponding image matching result based on the comparison result, wherein the image matching result is used for judging whether the first road image and the second road image of the corresponding image pair are the same scene image.

12. The apparatus of claim 11, wherein the processing unit is configured to:

13. The apparatus of claim 11, wherein the processing unit is configured to:

14. The apparatus according to any one of claims 11 to 13, wherein the processing unit is configured to:

15. The apparatus of claim 14, wherein the processing unit is configured to:

16. The apparatus of claim 14, wherein the processing unit is configured to:

17. The apparatus of claim 14, wherein the processing unit is configured to:

18. The apparatus of claim 11, wherein the processing unit is configured to:

19. The apparatus of claim 18, wherein the processing unit is further configured to determine, based on the first result, an identification component change region in the first road image and the second road image after determining that an image matching result corresponding to the image pair is a first result when the first value is less than a second preset threshold; and updating the second road image based on the identification component change area.

20. The apparatus according to any one of claims 11 to 13, wherein the acquisition unit is further configured to: acquiring at least one first road image and at least one second road image before acquiring at least one image pair;

the processing unit is used for respectively carrying out image alignment processing on each first road image and each second road image in pairs to obtain an image alignment result, and the image alignment result comprises at least one image pair.

21. An image processing apparatus, characterized by comprising: an input/output interface, a processor, and a memory, the memory having program instructions stored therein;

the processor is configured to execute program instructions stored in a memory and to perform the method of any one of claims 1 to 10.

22. A computer readable storage medium comprising instructions which, when run on a computer device, cause the computer device to perform the method of any of claims 1 to 10.