CN114757195A

CN114757195A - Method and device for identifying meaning of layout information of road directing and tourism sign

Info

Publication number: CN114757195A
Application number: CN202210252900.XA
Authority: CN
Inventors: 李春阳; 孙传姣; 张潇丹; 李萌; 廖军洪; 陈永胜
Original assignee: Research Institute of Highway Ministry of Transport
Current assignee: Research Institute of Highway Ministry of Transport
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-07-15

Abstract

The present disclosure provides a method and a device for identifying the meaning of layout information of a road directing and tourism sign, wherein the method comprises the following steps: inputting road video information into a pre-trained target detection convolutional neural network rough classification model to obtain image area information of road directions and travel signs; inputting the image area information of the road guide and the travel sign into a pre-trained convolutional neural network road guide and travel sign fine classification model to obtain the category information of the road guide and the travel sign; inputting the image area information of the road-indicating sign and the travel sign into a pre-trained scene character processing model to obtain the character information of the road-indicating sign and the travel sign; inputting the image area information of the road guide and the travel sign into a pre-trained pattern symbol detection convolutional neural network model to obtain the image information of the road guide and the travel sign; and obtaining semantic information of the road directions and the travel signs by utilizing a pre-designed semantic extraction rule.

Description

Method and device for identifying meaning of layout information of road directing and tourism sign

Technical Field

The disclosure relates to the field of road traffic infrastructure digitization, and more particularly, to a method and an apparatus for identifying meaning of layout information of road-directing and tourism signs.

Background

Road traffic signs are devices that convey specific information to traffic participants using graphic symbols, colors and words for managing traffic and securing facilities, and are generally presented in the form of signboards on both sides or above the road. The road traffic signs are of various types, and can be divided into seven categories, namely warning signs, prohibition signs, indicating signs, road indicating signs, tourist areas signs, road construction safety signs and auxiliary signs. The marks play important roles in defining traffic behavior specifications, regulating traffic flow, improving road traffic capacity, reducing traffic accident rate, ensuring traffic safety and smoothness and the like.

The division criterion is mainly developed aiming at the functions of the marks, and from the view point of extracting the road traffic mark information, the marks can be further aggregated into two categories: the fixed sign for displaying information in a fixed form, and the road directing sign and the travel sign for displaying information in a road directing and travel form. The former includes most warning signs, prohibition signs, road construction safety signs and auxiliary signs, and is characterized in that the sign content is fixed no matter where the sign appears. Correspondingly, the latter usually consists of characters, numbers and graphic symbols related to the road information, and is mostly present in the indication signs, the road indication signs and the tourist area signs. Typically, in road and tourist signs, text is used to identify a place name, numerals are used to identify a distance from a current location to a specified place, and graphic symbols are often present at intersections and the like to indicate a vehicle passing direction from the current location to a different place. It can be seen that the contents on the road and the travel signs are closely related to the appearance place and the road topological structure, and the information extraction needs to integrate a plurality of visual analysis technologies such as scene text detection and identification, graphic symbol identification and the like.

Compared with the fixed mark, the fixed mark has definite meaning and relatively single form, the method for visually identifying the fixed mark is relatively mature, the road-directing mark and the travel mark are much more complex, and the fixed mark simultaneously contains unfixed characters, numbers and figures. The expressed content is related to the distribution arrangement mode of the content besides the characters and the graphs in the content. Therefore, it is very difficult to automatically and intelligently understand the contents of the road and travel signs, and the automatic and intelligent understanding is still a front challenge in the field of digitization of road traffic infrastructures.

Road traffic is one of the traffic modes with the highest popularization degree in China. Meanwhile, when the new highway is communicated with the stock highway, the updating of road traffic direction and travel signs on all the stock roads connected with the new highway can be related, including but not limited to: (1) setting a new indication and a new road indicating sign, and identifying the connection information of the new road and the stock road; (2) and updating the indication, mileage and other information on the stock road sign board. This is because the topology of the road network changes after a new road accesses the road network, and the optimal path and distance to some destinations also change. In addition, the change of the place name of the road connection or the more detailed division may occur. When the number of highways in the early days is relatively small, the connections to cities are often marked directly by city names, such as continents. With the densification and refinement of road networks, the connection ports are further divided into east and west continents. The above situation fully highlights the necessity of regular maintenance and updating of road traffic signs, especially for road and tourist signs.

In actual practice, however, there are implementation and regulatory difficulties with the maintenance and updating of the flags. The maintenance and updating of road direction and tourism traffic signs become a bottleneck problem to be solved urgently in the field of digitization of road traffic infrastructures.

Disclosure of Invention

The embodiment of the disclosure aims to provide a method and a device for identifying meaning of layout information of a road directing and a travel sign, which are convenient for constructing a digital network of the road directing and the travel traffic sign by automatically extracting and analyzing semantic information related to the road directing and the travel sign.

In a first aspect, the present invention provides a method for identifying meaning of layout information of a road guiding sign and a travel sign, comprising:

inputting road video information into a pre-trained target detection convolutional neural network rough classification model to obtain road-indicating and tourist sign image area information in the road video;

inputting the image area information of the road indicator and the travel sign into a pre-trained convolutional neural network road indicator and travel sign fine classification model to obtain the category information of the road indicator and the travel sign in the image area information of the road indicator and the travel sign, wherein the category information comprises an indicator sign category, a road indicator sign category and a travel area sign category;

Inputting the image area information of the road guide and the travel sign into a pre-trained scene character processing model to obtain the character information of the road guide and the travel sign in the image area information of the road guide and the travel sign, wherein the character information comprises character content information and the position information of the area where the characters are located;

inputting the image area information of the road and the travel sign into a pre-trained graph symbol detection convolutional neural network model corresponding to the category information of the road and the travel sign to obtain the image information of the road and the travel sign in the image area information of the road and the travel sign, wherein the image information comprises image content information and the position information of the area where the image is located;

and obtaining semantic information of the road and travel signs by utilizing a pre-designed semantic extraction rule according to the text content information, the position information of the region where the text is located, the image content information and the position information of the region where the image is located.

Further, the road video information includes a plurality of video frames and geographical location information of each video frame.

Further, after the step of obtaining the semantic information of the image area information of the direction and travel sign, the method further comprises the following steps:

And storing the video frame where the road directing and the travel sign are located, the geographic position information of the located video frame and the semantic information of the road directing and the travel sign.

Further, the step of inputting the road video information into a pre-trained object detection convolutional neural network rough classification model to obtain the image area information of the road indicator and the travel sign in the road video comprises the following steps:

extracting the road directions and travel signs existing in each video frame by utilizing a pre-trained target detection convolutional neural network rough classification model;

if the road and the travel mark exist, returning the area position information of the target image information comprising the road and the travel mark aiming at each road and travel mark, wherein the corresponding type of the target image information is the type of the road and the travel mark;

and extracting a road directing and travel sign image area from the road video information according to the area position information of the target image information, abandoning the image information of the area without the road directing and travel sign, and storing the road directing and travel sign image area and the type corresponding to the target image information.

Further, the step of inputting the image area information of the road indicator and the travel sign into a pre-trained scene word processing model to obtain the word information in the image area information of the road indicator and the travel sign comprises:

Inputting the image area information of the road and the travel sign into a pre-trained scene character detection model to obtain the position information of the area where the character is located;

extracting character areas from the road-indicating and tourist sign image areas according to the position information of the areas where the characters are located;

inputting the extracted character area into a pre-trained scene character recognition model, and recognizing character contents in the direction and the travel sign;

the scene character processing model comprises a scene character detection model and a scene character recognition model.

In a second aspect, an apparatus for identifying meaning of layout information of road and tourism labels comprises:

the rough classification module is used for inputting road video information into a pre-trained target detection convolutional neural network rough classification model to obtain the image area information of the road guide and the travel sign in the road video;

the fine classification module is used for inputting the road and travel sign image area information into a pre-trained convolutional neural network road and travel sign fine classification model to obtain the category information of the road and travel signs in the road and travel sign image area information, wherein the category information comprises an indication sign category, a road sign category and a travel area sign category;

The character information processing module is used for inputting the image area information of the road guide and the travel sign into a pre-trained scene character processing model to obtain the character information of the road guide and the travel sign in the image area information of the road guide and the travel sign, wherein the character information comprises character content information and the position information of an area where the characters are located;

the image information processing module is used for inputting the image area information of the road and the travel sign into a pre-trained graph symbol detection convolutional neural network model corresponding to the category information of the road and the travel sign to obtain the image information of the road and the travel sign in the image area information of the road and the travel sign, wherein the image information comprises image content information and the position information of the area where the image is located;

and the semantic processing module is used for obtaining the semantic information of the guide and the travel sign by utilizing a pre-designed semantic extraction rule according to the text content information, the text region position information, the image content information and the image region position information.

Furthermore, the device for identifying the meaning of the layout information of the road directing and the travel mark further comprises:

the preprocessing module is used for receiving road video information and geographical position information synchronous with the road video information, extracting a plurality of video frames from the road video information and associating the plurality of video frames with the geographical position information.

Furthermore, the device for identifying the meaning of the layout information of the road directing and tourism labels further comprises:

and the information storage module is used for storing the video frame where the road directing and the travel sign are located, the geographic position information of the video frame where the road directing and the travel sign are located and the semantic information of the road directing and the travel sign.

In a third aspect, a computer readable storage medium stores a computer program, which when executed by a processor, implements the method for identifying meaning of layout information of directions and travel signs.

In a fourth aspect, a computer device, the computer device comprising:

a processor; and the memory stores a computer program, and when the computer program is executed by the processor, the method for identifying the meaning of the finger path and the travel mark layout information is realized.

The invention relates to a method and a device for identifying the meaning of layout information of a road index and a travel sign, which extract the position and the type of the road traffic sign of the road index and the travel type from an acquired road image through a convolutional neural network, comprehensively extract the information of the road index and the travel sign by utilizing a character detection network, a character identification network, a graphic symbol detection network and corresponding analysis rules, and realize the digitization of the information of the road index and the travel sign of various roads.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a method for identifying meaning of layout information of directions and travel signs according to a first embodiment of the disclosure.

Fig. 2 is a schematic block diagram of a road guiding and travel sign layout information meaning recognition apparatus according to a second embodiment of the disclosure.

Detailed Description

Embodiments of the present invention are described in detail below with reference to the accompanying drawings.

It should be noted that, in the case of no conflict, the features in the following embodiments and examples may be combined with each other; moreover, based on the embodiments in the present disclosure, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

As shown in fig. 1:

step 101, inputting road video information into a pre-trained object detection convolutional neural network rough classification model to obtain image area information of a road indicator and a travel sign in the road video.

Specifically, a pre-trained target detection convolutional neural network model is used for extracting road direction and travel traffic signs existing in the video frame, and if the road direction and the travel traffic signs exist, the detection results of the positions of the road direction and the travel signs and the rectangular frame form are returned.

And 102, inputting the image area information of the road indicating and the travel sign into a pre-trained convolutional neural network road indicating and travel sign fine classification model to obtain the category information of the road indicating and the travel sign in the image area information of the road indicating and the travel sign, wherein the category information comprises an indication sign category, a road indicating sign category and a travel area sign category.

Step 103, inputting the image area information of the road indicator and the travel sign into a pre-trained scene character processing model to obtain the character information of the road indicator and the travel sign in the image area information of the road indicator and the travel sign, wherein the character information comprises character content information and position information of an area where the characters are located.

And step 104, inputting the image area information of the road and the travel sign into a pre-trained graph symbol detection convolution neural network model corresponding to the category information of the road and the travel sign to obtain the image information of the road and the travel sign in the image area information of the road and the travel sign, wherein the image information comprises image content information and the position information of the area where the image is located.

And 105, obtaining semantic information of the directions and the travel signs by utilizing a pre-designed semantic extraction rule according to the text content information, the text area position information, the image content information and the image area position information.

Specifically, the indication mark category, the direction mark category and the travel area mark category are all processed according to the following technical scheme:

and judging the number of the signboards in the detected signboard area, and if a plurality of the signboards are detected, simultaneously distinguishing the signboards by combining a feature extraction network and a feature matching algorithm and combining an IOU threshold value. The specific processing steps for each of the signboards are as follows:

1. extracting character areas in the signboards by using a pre-trained scene character detection and recognition model, and recognizing character contents in the character areas;

2. And detecting a convolutional neural network model by utilizing the pre-trained graphic symbols aiming at the category, and extracting the category corresponding to the graphic symbols in the signboard and the region where the graphic symbols are located, wherein the graphic symbols contained in each category can refer to the 2 nd part of the road traffic sign and the marking line: all graphic symbols of the category enumerated in the road traffic sign ";

3. designing a semantic extraction rule according to the extraction result of the characters and the graphic symbols and the position proximity relation thereof to obtain semantic information of the signboard, such as: "straight ahead, 50 km to zheng zhou".

In the embodiment, based on the collected road video and the corresponding GPS information, the positions and the types of road traffic signs of the finger path and the travel type are extracted from the collected road image through the convolutional neural network, the information of the finger path and the travel sign is extracted by comprehensively utilizing the character detection network, the character recognition network, the graphic symbol detection network and the corresponding analysis rule, and the digitalization of the information of the various road traffic signs is realized through the above technology.

In particular embodiments, there are several preferred ways:

such as prior to step 101 comprising:

training data processing and collection:

the method comprises the steps of obtaining an original image by performing frame extraction processing on an original video, classifying the original image into a non-signboard image and a signboard image by adopting a ResNet series convolutional neural classification network, only reserving the image of the signboard category, manually labeling the reserved image as a target detection data set: using a labeling tool labelme to frame out the signboard in the image by using a rectangle, recording the position and the category of the frame, wherein the category is an indication mark, a road-indicating mark, a tourist area mark and other four categories; when labeling is performed, the following criteria are followed: 1. only the signboard is marked clearly; 2. the signboard is marked in the image and has the area size smaller than 2500 pixels; 3. discarding the image if the fuzzy indistinguishable type of the signboard exists in the image; 4. and marking the signboard meeting the requirement in the image.

Preferably, for the target detection algorithm, yolov5 can be used as a basic algorithm for target detection, and yolov5 mainly comprises two parts, namely a backbone network and a target detector, wherein the backbone network adopts Darknet (a convolutional neural network invented by Darknet company), the backbone network extracts features in an image through convolution operation, the features are input into an FPN feature pyramid structure to perform feature fusion, the last three convolutional layers C3, C4 and C5 are selected, then the high-level semantic information and the low-level information are fused through the FPN structure, then the high-level semantic information and the low-level information are convoluted through 3x3, and finally the needed number of channels is adjusted through convolution with 1x 1. The number of output channels is 3 (K +5), 3 represents the anchor frame with three sizes set by each layer, K represents the category number, 5 can be divided into 4+1, which are 4 parameters of the target frame and 1 parameter to judge whether there is an object in the frame.

Specifically, step 101 may include:

the prepared training data is used for training a target detection algorithm yolov5, a proper result is obtained by adjusting the training, and finally a required model is obtained, wherein the model can return the type of each mark and the detection result in the form of a rectangular box, and the types are as follows: indicator signs, direction signs, tourist area signs, etc.

Preferably, step 102 comprises: the part processes the returned results of the way directing and the travel signs through the result processing of the target detection model yolov5, and simultaneously comprises the accurate position of the signboard in the original image; and (3) deducting the signboard from the original image according to the result to obtain a traffic sign image area, sending the traffic sign image area into a pre-constructed convolutional neural network classification model, and further classifying the traffic sign image area into four subcategories: indicator signs, direction signs, tourist areas signs and others.

The construction process of the convolutional neural network classification model is briefly described as follows:

first, for the creation of a training data set: when the target detection data set is manually marked, a signboard which is selected in a frame exists, but the category in the target detection data set is only the first category of the road and the travel sign and others, in the part, an image containing the categories of the road and the travel sign is selected, the road and the travel sign are deducted and stored from the image according to marking data to obtain a collected data set of the road and the travel sign, the collected data set is further classified into sub categories, and the categories are as follows: indicator signs, direction signs, tourist area signs and other four categories. Further, according to section 2 of road traffic signs and markings: according to the standard in the road traffic sign, a road directing and a travel sign are automatically generated by designing an algorithm, the road directing and the travel sign automatically generated by the algorithm are migrated to the road directing and the travel sign of a natural scene picture according to the sign in a data set of the road directing and the travel sign, and a batch of automatically synthesized data can be obtained by the method and stored as a synthesized data set.

Secondly, constructing a convolutional neural network classification model: the convolutional neural network has excellent performance on the task of classifying the images, and in the part, a convolutional neural network classification model is constructed to complete the further classification task of the road-indicating and tourism signs. Through investigation and experiment, the resnet convolutional neural network series has good adaptability to the task. Firstly, pre-training a resnet neural network on an Imagenet data set, and finely adjusting a pre-trained model on a collected road and travel sign data set and a synthesized data set to finally obtain a better road and travel sign classification model.

Thirdly, sub-categories of the road directions and the travel signs can be obtained through the classification of the road direction and travel sign classification models, and the following steps are adopted for the sub-categories to carry out analysis:

aiming at the uncertainty of the contents of the road-guiding sign and the travel sign, the text content and the graphic symbol in the sign need to be identified, and firstly, aiming at the detection and identification of the text content:

extracting a road indicating image area and a tourist mark image area from an original image according to manual marking information, storing the extracted image areas to obtain a road indicating data set and a tourist mark data set, marking the data set again, marking the content as a character area and character content, marking a graph symbol area and the content of a graph symbol, using the marked data set as a scene character data set of a training character detection and recognition model, further using a data synthesis tool Style-Text to synthesize a large number of images similar to a traffic sign in batch, and expanding the scene character data set by automatically synthesizing the data set to obtain a road indicating and tourist mark character detection and recognition data set and a graph symbol data set.

Preferably, step 103 may include:

for the text region detection model: in the signboard, characters are generally place names, distances, prompt words and the like, the length of the general characters is short, and the positions of the characters in the signboard are fixed, so that the text detection network DBNet can be selected as a character region detection model, the model is widely accepted in practical application, and the problem of scene character detection with a fixed scene can be solved well. And training the DBNet network model on the marked and synthesized road-directing and tourism sign character detection data set to construct a sign character detection model.

For the text region content recognition model: the part adopts a character recognition model CRNN mature in the industry, the model combines CNN, RNN and CTC loss, and is a text sequence recognition scheme, and the model is trained on a road direction and travel sign character recognition data set to construct and obtain a sign character recognition model. In application, the precise position of the character region on the signboard can be obtained through the constructed character region detection model, the character region is further extracted from the signboard and input into the character region content identification model, the content of the character region is further identified, character information in the signboard is obtained, and the character information and the corresponding character position are stored. Besides text areas, the road and travel signs also comprise graphic symbols with semantic meanings, and the identification of the graphic symbols is also essential for understanding the content of the whole road and travel signs.

Preferably, step 104 includes: for the recognition of the graphic symbols, the recognition can be specifically performed as a target detection task, so that the part still adopts yolov5 detection network as a solution, and takes the road and tourist mark graphic symbol data sets obtained by labeling and synthesis as training data sets of yolov5 to train and construct a graphic symbol detection model.

And inputting the image areas of the road-indicating sign and the travel sign into a constructed graphic symbol detection model to obtain the category of the graphic symbol and the position of the graphic symbol in the road-indicating sign and the travel sign. Because the contents of the road-guiding sign and the travel sign are uncertain, the meanings of the road-guiding sign and the travel sign are different, and therefore, logical reasoning analysis needs to be carried out according to the result of character detection and identification and the result of graphic symbol identification to obtain the correct meanings of the road-guiding sign and the travel sign.

For example, for two road and travel signs, corresponding information can be obtained after the two road and travel signs pass through a character detection recognition model and a graphic symbol recognition model. For signboard 1, text content: yixing; graphic symbols: advancing; and position information of text and graphic symbols. For sign 2, text: changzhou, Cao bridge, 4, export; graphic symbols: left-going; and position information of text and graphic symbols.

Corresponding to the above example, step 105 comprises: according to the formulation rule of the signboards, the meanings of the two signboards can be reasoned and analyzed to be respectively: 1. the front straight line is in the pleased direction; 2. left row is exit No. 4.

Fig. 2 is a schematic block diagram of a device for identifying meaning of layout information of directions and travel signs according to a second embodiment of the disclosure. The method embodiment shown in fig. 1 may be used to explain the present embodiment. As shown in fig. 2: the device for identifying the meaning of the layout information of the road and tourism logo comprises:

the rough classification module 201 is configured to input road video information into a pre-trained target detection convolutional neural network rough classification model to obtain information of a road indicator and a travel sign image area in the road video;

the fine classification module 202 is configured to input the road-indicating and travel sign image area information into a pre-trained convolutional neural network road-indicating and travel sign fine classification model, so as to obtain category information of the road-indicating and travel signs in the road-indicating and travel sign image area information, where the category information includes an indicator sign category, a road-indicating sign category, and a travel area sign category;

the text information processing module 203 is used for inputting the image area information of the road indicator and the travel sign into a pre-trained scene text processing model to obtain text information of the road indicator and the travel sign in the image area information of the road indicator and the travel sign, wherein the text information comprises text content information and information of the area position where the text is located;

The image information processing module 204 is configured to input the image area information of the road indicator and the travel sign into a pre-trained graph symbol detection convolutional neural network model corresponding to the category information of the road indicator and the travel sign, so as to obtain image information of the road indicator and the travel sign in the image area information of the road indicator and the travel sign, where the image information includes image content information and information of an area where the image is located;

and the semantic processing module 205 is configured to obtain semantic information of the directions and the travel signs by using a pre-designed semantic extraction rule according to the text content information, the location information of the text areas, the image content information, and the location information of the image areas.

Preferably, the device for identifying the meaning of the layout information of the road directing and travel sign further comprises: the pre-processing module 200 is configured to receive road video information and geographic position information synchronized with the road video information, extract a plurality of video frames from the road video information, and associate the plurality of video frames with the geographic position information.

Further preferably, the device for identifying the meaning of the layout information of the road directing and travel sign further comprises: and the information storage module 206 is configured to store the video frame where the road and travel sign is located, the geographic location information of the video frame where the road and travel sign is located, and semantic information of the road and travel sign.

The invention also provides a computer readable medium, wherein the computer readable medium stores computer instructions, and when the computer instructions are executed by a processor, the computer instructions cause the processor to execute the method for identifying the meaning of the road directing and travel mark layout information.

The present invention also provides a computer apparatus comprising: at least one memory and at least one processor; the at least one memory to store a machine readable program; and the at least one processor is used for calling the machine readable program and executing the way directing and travel mark layout information meaning identification method.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion unit is caused to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the embodiments described above.

It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted according to the needs. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities separately, or some components may be implemented together in a plurality of independent devices.

In the above embodiments, the hardware unit may be implemented mechanically or electrically. For example, a hardware element may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. A hardware element may also comprise programmable logic or circuitry (e.g., a general-purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. A method for identifying the meaning of layout information of a road and a tourist mark is characterized by comprising the following steps:

Inputting the image area information of the road and the tourism mark into a pre-trained convolutional neural network road and tourism mark fine classification model to obtain the category information of the road and the tourism mark in the image area information of the road and the tourism mark, wherein the category information comprises an indication mark category, a road indication mark category and a tourism area mark category;

and obtaining semantic information of the road guide and the travel sign by utilizing a pre-designed semantic extraction rule according to the text content information, the text region position information, the image content information and the image region position information.

2. The method of claim 1, wherein the road video information comprises a plurality of video frames and geographical location information of each video frame.

3. The method for recognizing a meaning of layout information of a road guiding and a travel sign according to claim 2, wherein the step of obtaining semantic information of image area information of the road guiding and the travel sign further comprises:

4. The method for recognizing the meaning of the layout information of the road guiding and the traveling signs according to the claim 3, wherein the step of inputting the road video information into a pre-trained object detection convolutional neural network rough classification model to obtain the image area information of the road guiding and the traveling signs in the road video comprises the following steps:

5. The method for identifying directions and travel sign layout information according to claim 4, wherein the step of inputting directions and travel sign image area information into a pre-trained scene character processing model to obtain character information in the directions and travel sign image area information comprises:

6. A device for identifying the meaning of layout information of a road directing and tourism logo is characterized by comprising:

the rough classification module is used for inputting road video information into a pre-trained target detection convolutional neural network rough classification model to obtain the image area information of the road indicator and the travel sign in the road video;

the fine classification module is used for inputting the road indicating and tourism sign image area information into a pre-trained convolutional neural network road indicating and tourism sign fine classification model to obtain the category information of the road indicating and tourism signs in the road indicating and tourism sign image area information, wherein the category information comprises an indication sign category, a road indicating sign category and a tourism area sign category;

the text information processing module is used for inputting the image area information of the road indicator and the travel sign into a pre-trained scene text processing model to obtain the text information of the road indicator and the travel sign in the image area information of the road indicator and the travel sign, wherein the text information comprises text content information and the position information of the area where the text is located;

the image information processing module is used for inputting the image area information of the road indicator and the travel sign into a pre-trained graph symbol detection convolutional neural network model corresponding to the category information of the road indicator and the travel sign to obtain the image information of the road indicator and the travel sign in the image area information of the road indicator and the travel sign, wherein the image information comprises image content information and the position information of the area where the image is located;

7. The apparatus for recognizing meaning of layout information of fingering and tourism signs according to claim 6, further comprising:

8. The means for recognizing meaning of layout information of fingering and tourism signs according to claim 7, further comprising:

9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for recognizing meaning of road and tourist label layout information according to any one of claims 1 to 5.

10. A computer device, characterized in that the computer device comprises:

a processor; and

a memory storing a computer program which, when executed by the processor, implements the method for recognizing meaning of layout information of fingering and travel sign according to any one of claims 1 to 5.