CN115761782A

CN115761782A - Road engineering drawing title bar information extraction method

Info

Publication number: CN115761782A
Application number: CN202211333307.4A
Authority: CN
Inventors: 孙力; 李文浩; 范梦羽; 刘大为; 程振波; 肖刚
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-03-07

Abstract

The invention provides a method for extracting title bar information of a road engineering drawing, which comprises the following steps: identifying and cropping a title bar image; segmenting the title bar cell image; extracting cell text data; extracting a cell semantic label; and generating a drawing index and recording the drawing index into a database. According to the method, the complementarity between the drawing visual modes and the text modes is utilized, the drawing information is extracted by combining the title bar line frame structure information, and the redundancy among the modes is reduced, so that a more accurate drawing title bar information extraction effect is realized.

Description

Road engineering drawing title bar information extraction method

Technical Field

The invention belongs to the technical field of engineering intelligent information, and particularly relates to a method for extracting road engineering drawing title bar information.

Background

The construction of Chinese roads has accelerated the pace since 1979, each participating unit has accumulated a huge number of road engineering drawings, the number of the engineering drawings of a medium-sized design institute is usually more than one million, most of the drawings are generated in the design process, the data needs to be stored for subsequent use in processes including picture examination, budget estimation, construction and the like, which not only is a huge material wealth for enterprises, but also brings huge storage and retrieval burdens for the enterprises. The road engineering drawing generally comprises a drawing frame, a drawing, a title bar, a countersign bar, a dimension mark, a form, a word description and the like, wherein the title bar of the drawing comprises drawing key information such as drawing units, engineering names, drawing names and the like, can help engineers to quickly know the drawing, and has a high reference value for specialized retrieval and aided design decision. The data shows that over 80% of the engineering project designs are slightly modified or reused based on existing engineering drawings. Therefore, in the face of the drawings accumulated gradually, the drawings are quickly and accurately retrieved, the existing design knowledge is reused, the design period is shortened, and the method has important significance in the research, development and innovation processes of enterprises.

The title bar information is an important component of engineering drawing automatic processing and intelligent library building, and directly influences the accuracy and precision of the library building. The title bar data is extracted and utilized, so that not only can engineering personnel search drawings be facilitated, but also scheme decision support can be provided for the design of a new project, the premise and the basis of further work such as drawing understanding are provided, and the effective solution for solving the problems of drawing storage and management, reducing repeated labor and improving the design efficiency is provided.

The method for extracting the drawing information disclosed in the present disclosure is, for example, publication No. CN103761331a, which is named as: a processing system and a processing method for drawing data, and a publication number CN102117269B, namely a device and a method for digitizing documents, rely on third-party software such as AutoCAD to extract structural information in drawings, have strict limitation on the file format of the drawings, and only can be editable drawing files in a dwg or dxf format. In an actual scene, original drawing files in the dwg format are usually scattered in hands of each designer, a plurality of reference diagrams in the design process are reserved in the files, the data standardization degrees are different, stamping marks are not provided, and whether the files are final versions or not is difficult to confirm. Therefore, in order to ensure the universality of the data format and the effectiveness of the electronic image file, the scanning piece of the paper stamping drawing is taken as the standard when the enterprise drawing is filed, and the final filed document is in a pdf format, so that the method cannot be applied. Publication number CN101882225B, name: the method and the system for extracting the engineering drawing material information based on the template extract the drawing information based on the manually designed template, are only suitable for drawings with certain specific styles, and are different in title bar formats and layouts of drawings of different enterprises and different types, for example, the sizes, the positions or the mark characters of different attribute cells in the title bar are different, and the identification effect by adopting the template cannot meet the actual requirement.

The title bar is also a table in nature, and can be regarded as a table to acquire information by adopting the existing table extraction method. The existing table extraction method, such as publication No. CN105589841B, is named as: a PDF document table identification method, publication number CN103258198B, is named as: a method for extracting characters from table document image features that when the boundary position coordinate of table is recognized, additional table header information is used to aid positioning, and in the drawing in practical application, the header only consists of lines to form structure information, and a lot of annotation information (such as graphic representation and word description) described in natural language is existed around the header. The method has the advantages that the CN112364834A is adopted, character recognition is carried out by utilizing a character detection model after cell vertex coordinates are obtained, the method can only locate and extract characters in the whole table, structural information of the table is ignored, semantic information of each cell is lacked, and the method cannot provide help for follow-up operations such as information archiving and retrieval.

The engineering drawing is the result of engineering design and is also the basis of the engineering construction process. The road design engineering drawing not only comprises structural information consisting of lines, but also comprises a large amount of marking annotation information described by natural language, and the information in the title bar needs to be combined with the line frame structural information in the drawing when being extracted. Meanwhile, in order to ensure that the design drawing is consistent with the actual construction drawing or the drawing finally submitted to the examination, a pdf format drawing file is required to be used when an enterprise is stipulated and filed. Therefore, the invention provides an automatic extraction method for realizing the title bar information of the drawing, aiming at the problems that the existing extraction method for the information of the engineering drawing is poor in adaptability and difficult to deal with the characteristics of complex layout structure, high noise, unobvious features and the like of the drawing.

Disclosure of Invention

The invention provides a method for extracting information of a road engineering drawing title bar based on a multi-mode technology, which combines the structure information of a title bar frame when drawing information is extracted by utilizing the complementarity between drawing vision and text modes, reduces the redundancy among the modes and realizes a more accurate drawing title bar information extraction effect.

Taking a road engineering drawing as an example, the road engineering drawing generally comprises a drawing frame, a drawing, a title bar, a dimension label, a table, a text description and the like, wherein the title bar comprises information such as drawing units, engineering names, drawing names and the like, and is positioned at the bottom of the drawing, as shown by an S66 label in fig. 6.

The general flow of the invention includes: identifying and cropping a title bar image; segmenting the title bar cell image; extracting cell text data; extracting a cell semantic label; and generating a drawing index and recording the drawing index into a database.

A road engineering drawing title bar information extraction method comprises the following steps:

step 1: recognizing a title bar from a drawing by using an image target detection model, and cutting the title bar into an independent image file I;

step 2: reading the title bar image I obtained in the step 1, extracting all horizontal lines and vertical lines contained in the image to obtain intersection points among all cells, and circularly dividing each cell image T according to the distance among the intersection points and the table connectivity _i ；

And step 3: in accordance withReading the cell image T obtained in step 2 _i Extracting the text or character information in the cell by using a text recognition engine to obtain cell text data W _i ；

And 4, step 4: the cell image T obtained in the step 2 _i And the text data W obtained in step 3 _i Inputting the road engineering drawing information extraction model, extracting a cell feature vector, and outputting a cell semantic label;

and 5: and (4) generating a drawing index according to the semantic labels of the cells obtained in the step (4) and according to specification convention, and recording the drawing index into a database.

Preferably, the recognizing the title bar from the drawing by using the image object detection model in step 1, and cutting the title bar into an independent image file I specifically includes:

step 1.1: pre-labeling title bars in part of drawings, namely selecting the title bars of the drawings by frames, recording coordinates, and constructing a labeled drawing data set;

step 1.2: and (3) inputting the annotation drawing data set in the step 1.1 into an image target detection model for training, and learning the image characteristics of the title bar, wherein the image characteristics comprise line segment positions, frame positions, line segment shapes and the like. In the invention, the image target detection model adopts a YOLO model, and different image target detection models cannot influence the result of the invention;

step 1.3: and (3) after the model training in the step 1.2 is finished, inputting the drawing to be extracted into the model, outputting the title bar coordinates of the drawing, and cutting the title bar into an independent image file I according to the coordinates.

Preferably, all horizontal lines and vertical lines included in the image are extracted in step 2, intersection points between the cells are obtained, and each cell image T is segmented _i The method specifically comprises the following steps:

step 2.1: and (3) preprocessing the title bar image I obtained in the step (1), namely performing binarization processing on the title bar image I to obtain a binarized image of the black characters on the white background. The binarization processing of the image is to make the whole image show obvious black and white effect by making the gray value of each pixel point in the image tend to 0 or 255, so as to reduce noise interference;

step 2.2: and performing transverse expansion and longitudinal expansion on the image after the binarization processing to respectively obtain a transverse line image A and a vertical line image B in the image. Wherein, the horizontal expansion operation refers to horizontal expansion of white color blocks in the image, so as to wipe out texts and vertical lines to obtain horizontal lines, and the horizontal line image A is shown as the image (b) in the attached figure 7; the vertical expansion operation means that a white color block in the image is expanded vertically, so that the text and the horizontal lines are wiped off to obtain vertical lines, and the vertical line image B is shown in FIG. 7 (c);

step 2.3: combining the horizontal line image A and the vertical line image B obtained in the step 2.2, keeping an intersection point, calculating the coordinates of the intersection point in the cell and respectively storing the x-axis coordinates and the y-axis coordinates of all the intersection points, wherein the intersection point image C is shown in the figure 7 (d);

step 2.4: according to the intersection point coordinates obtained in the step 2.3, sequentially segmenting the cell image T from the title bar image I according to the distance and connectivity between the intersection points _i . Wherein, i is the number of the cells in the title bar, and the number of the cells in the drawings with different formats may be different. The connectivity between the intersection points refers to the positions of four vertexes forming the cell, namely, the upper left, the upper right, the lower right and the lower left, and the distance between two adjacent intersection points is greater than a set threshold value and is regarded as two points on the same cell line segment.

Preferably, the extraction of the cell image T by the text recognition engine in step 3 _i Obtaining cell text data W by internal text or character information _i The method specifically comprises the following steps:

sequentially acquiring cell images T by using a text recognition engine _i The text data in (1) is identified and the identified text data is preprocessed to obtain cell text data W _i . The preprocessing operation comprises removing special symbols in the text and filtering the text with low recognition confidence in the data. Different text recognition engines do not affect the inventive result.

Preferably, the unit cell image T of step 4 _i And text data W _i Extracting cell feature vector from the information extraction model of the input road engineering drawing, and outputting cell semantic label：

Step 4.1: inputting the cell image into an image feature extraction model to obtain a visual feature vector v;

step 4.2: inputting the cell text data into a text feature extraction model to obtain a text feature vector t;

step 4.3: splicing the obtained visual feature vector V and the text feature vector t to obtain a pre-attention vector denoted as V, wherein the pre-attention vector is expressed as:

V＝[v ₁ ,v ₂ ,…,v _m ,t ₁ ,t ₂ ,…,t _n ]#(1)

the length of each vector V is m + n, and the length m and n of each embedded vector are obtained by corresponding feature extraction models (the size of m and n does not influence the result of the method);

step 4.4: the self-attention feature that yields the pre-attention vector V through the existing self-attention mechanism is denoted as a. The attention mechanism can quickly extract important features of sparse data, and is widely used for natural language processing tasks. The self-attention mechanism is a variant of the attention mechanism, reduces the dependence on external information, is better at capturing data or internal correlation of characteristics, and can calculate the weight coefficient of the visual characteristics and the text characteristics to solve the mutual influence among different modes;

step 4.5: and carrying out vector multiplication operation on the self-attention feature A and the pre-attention vector V to obtain a fusion feature vector, wherein the fusion feature vector is expressed as H:

H＝[A ₁ v ₁ ,A ₂ v ₂ ,…,A _m v _m ,A _m+1 t ₁ ,A _m+2 t ₂ ,…,A _m+n t _n ]#(2)

step 4.6: inputting the fused feature vector H into a full-link layer to be mapped to each semantic label, then predicting the probability value of each semantic label by using the existing Softmax activation function, and selecting the label with the highest probability score as the semantic label of the cell image to be output. The Softmax function is used in the multi-classification process, and can map the outputs of a plurality of neurons to a (0,1) interval, and the formula of the Softmax function is defined as:

wherein h is _i Represents the mapping of the current ith semantic label, and j represents the number of all semantic labels.

Preferably, the generating of the drawing index and the entering of the drawing index into the database according to the specification convention in the step 5 specifically include:

and generating title bar structured information according to the filling rules and the topological relation of the table, and recording the title bar structured information as a drawing index into a database.

Compared with other prior art, the invention has the beneficial effects that:

1) According to the method, the target detection neural network is adopted to position the title bar area, so that the problem that the conventional method depends on the prior information of the layout of the drawing is solved, the drawings with different layout structures can be better dealt with, and the universality and the flexibility are improved;

2) The method makes full use of the complementarity between the drawing vision and the text mode, combines the title bar line frame structure information when drawing information is extracted, reduces the redundancy between the modes, and obtains higher accuracy compared with other models based on a single mode.

Drawings

Fig. 1 is a schematic diagram of a method for extracting title bar information of a road engineering drawing according to an embodiment of the present application;

fig. 2 is a flowchart of a method for extracting title bar information of a road engineering drawing according to an embodiment of the present application;

FIG. 3 is a schematic diagram of title bar image processing provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of title bar cell semantic tag identification provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a semantic structure of a generated cell provided in an embodiment of the present application;

FIG. 6 is a schematic view of a road engineering drawing component provided by an embodiment of the application;

fig. 7 (a) is a title bar initial image provided in the embodiment of the present application, (b) is a result diagram of extracting horizontal lines from the title bar, (c) is a result diagram of extracting vertical lines from the title bar, and (d) is a result diagram of merging images to obtain intersections;

fig. 8 (a) is an image target detection model training set annotation diagram provided in the embodiment of the present application, and (b) is an experiment output result diagram;

fig. 9 (a) is a label graph of a training set of a text-named entity recognition model provided in the embodiment of the present application, and (b) is an experiment output result graph.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that the road engineering drawing in the embodiment of the present application may be an engineering drawing file in a PDF format, and may also be an electronic image obtained by scanning an actual drawing.

TABLE 1 comparison of the experimental results of the present invention with other classical models

The method selects and marks 1145 road engineering drawings, combines the title bar line frame structure information when extracting the drawing information by utilizing the complementarity between the drawing vision and the text mode, reduces the redundancy between the modes, and obtains higher accuracy compared with other models based on a single mode. The experimental task is designed aiming at different modes, such as an image target detection model, a YOLO model is selected in the experiment and used as a mainstream algorithm framework of target detection, prediction can be carried out based on the whole image information, learned features are more universal, and the classification result of each cell prediction frame is output after the image data set with manually marked cell attributes is used for training in the experiment; the text named entity recognition model is a BilSTM-CRF model selected for experiments, the BilSTM-CRF is a classic text named entity recognition model scheme, and can recognize entity character string boundaries (such as organization names, person names and the like) with specific meanings in texts, text data of the whole drawing is used as input in the experiments, and each recognized entity and a label thereof are output; the invention divides the cells after positioning the title bar area and outputs the semantic labels of the cells, and the comparison experiment result with other models is shown in table 1.

In the experimental process, except for adopting different network structures, other parameters are all the same in value, the learning rate is initially set to be 1e-4, iteration is carried out for 100 times, and all adopted optimizers are Adam optimizers. The Adam optimizer is one of the most popular optimizers in deep learning, has high computational efficiency and low memory requirement, and can better process noise samples. The comprehensive experiment results show that compared with other networks, the method provided by the invention has the advantages that the comprehensive performance is greatly improved, the identification is more accurate, and the method is more suitable for the task of identifying and classifying the information of the title bar of the engineering drawing.

1. and recognizing the title bar by using the image target detection model, outputting the coordinate position of the title bar by using the model, and cutting the title bar into an independent image file I according to the model output. The task of the image target detection model is to find out all interested targets in the image and determine the category and coordinate information of the targets, and the general process comprises the steps of constructing an annotation data set, training a model and predicting the model. In the invention, the image target detection model adopts a YOLO model, and different image target detection models do not influence the result of the invention. Particularly, in order to improve the adaptability of the model to drawings with different page structures, the invention additionally generates drawings with different page structures by randomly splicing the title bar cells to increase a training set, thereby effectively preventing the influence of a single drawing structure on the identification effect of the model.

2. Reading in a title bar image I, extracting all horizontal lines and vertical lines contained in the image to obtain intersection points among all cells, sequentially segmenting each cell image Ti according to the distance between the intersection points and the table connectivity, wherein the process of segmenting the cell image is shown in the attached figure 3, and specifically comprises the following steps:

step 2.1: as shown by reference numeral S31 in fig. 3, in order to reduce noise interference of the scanned image, the image is subjected to binarization processing to ensure that the image is a black character with white background. The binarization processing of the image is to make the gray value of each pixel point in the image tend to 0 or 255, so that the whole image has obvious black and white effect;

step 2.2: as shown by reference numeral S32 in fig. 3, the original title bar image I is subjected to a horizontal expansion operation, i.e., a horizontal expansion of white color blocks, and text and vertical lines are erased. Because white gaps exist among strokes of the text, most of the text is erased by transversely expanding the white color blocks, and only the title bar horizontal lines and few horizontal line strokes are left in the image, so that an image A only with the horizontal lines reserved is obtained, as shown in FIG. 7 (b);

step 2.3: as shown in reference numeral S33 in fig. 3, by performing a vertical expansion operation on the original image, that is, vertically expanding the white color block, erasing the text and the horizontal lines, the image only leaves the vertical lines in the title bar, and thus an image B with only the vertical lines left is obtained, as shown in fig. 7 (c);

step 2.4: as shown by reference numeral S34 in fig. 3, the image a obtained in step 2.2 and the image B obtained in step 2.3 are superimposed and combined to obtain an intersection point image C, as shown in fig. 7 (d), the coordinates of intersection points of the frame are obtained, and the coordinates of x-axis and y-axis of all the intersection points are respectively stored;

step 2.5: as shown by reference numeral S35 in fig. 3, traversing all the coordinates of the intersection points obtained in step 2.4, and sequentially segmenting the cell image T according to the distance and connectivity between the intersection points _i . Wherein i is the number of cells in the title bar, and the number of cells in different format drawings may be different, and in this embodiment, i is 13. The connectivity between the intersection points refers to the positions of four vertexes of the formed cell, namely, upper left, upper right, lower right and lower left, and the distance between two adjacent intersection points is greater than a set threshold value and is regarded as two points on the same cell line segment. In the invention, the intersection distance threshold is set to be 80 pixel points according to the size of the title bar image.

3. Cell images T are sequentially acquired using a text recognition engine such as EasyOCR _i The text recognition data in (1), different text recognitionThe other engines do not affect the results of the invention. Specifically, in order to ensure the robustness of subsequent word segmentation, the embodiment screens the confidence level of each text recognition content, retains text information with the confidence level greater than 0.5, and simultaneously performs interception processing on special symbols in the text information uniformly to filter texts with unqualified text lengths. Special characters, e.g.! @ # $%)&*()+＝|{}:；,.、\[]<>/？～！…@#￥％&* - [ ] "", etc.

4. Cell image T _i And corresponding text data W _i The method comprises the steps of inputting a road engineering drawing information extraction model, extracting cell feature vectors, and outputting cell language labels, wherein the process is shown in the attached figure 4 and specifically comprises the following steps:

step 4.1: as shown by reference numeral S41 in fig. 4, the cell image T _i In the input image feature extraction model, a ResNet model is adopted in the invention (different image feature extraction models do not influence the result of the invention), and a visual feature vector v is obtained. In particular, in order to keep consistent with the length of the text feature vector, the length of the visual embedded vector v obtained by passing the output result of ResNet through a full connection layer in the experiment is 768 (the length of the feature vector does not affect the result of the invention). The ResNet model is used as a common image feature extraction model, the problem that the network deepens gradient and disappears is well solved, and the features which can be extracted by the model are more abstract and have semantic information;

step 4.2: as indicated by reference numeral S42 in fig. 4, unit cell text data W _i When the text feature extraction model is input, the text feature vectors t are obtained by adopting a BERT model (different text feature extraction models do not influence the result of the text feature extraction model), and the length of each vector t is 768, which is obtained by the BERT model. The BERT model obtains vector representation of similarity between words through self-supervision learning, and can learn multi-layer transformation of vectors besides learning vectors, so that the BERT model is widely used for natural language processing tasks and can quickly extract the characteristics of sparse text data;

step 4.3: as shown by reference numeral S43 in fig. 4, the obtained visual feature vector V and the text feature vector t are spliced to obtain a pre-attention vector denoted as V, which is represented as:

V＝[v ₁ ,v ₂ ,…,v _m ,t ₁ ,t ₂ ,…,t _n ]#(1)

the length of each vector V is m + n, and the lengths m and n of each embedded vector are obtained by corresponding pre-training models (the sizes of m and n do not influence the result of the method);

step 4.4: as shown by reference numeral S44 in fig. 4, the self-attention feature vector of the pre-attention vector V is denoted as a by the existing self-attention mechanism, which can quickly extract important features of sparse data and thus is widely used in natural language processing tasks. The self-attention mechanism is a variant of the attention mechanism, reduces the dependence on external information, is better at capturing data or internal correlation of characteristics, and can calculate the weight coefficient of the visual characteristics and the text characteristics to solve the mutual influence among different modes;

step 4.5: as shown by reference numeral S45 in fig. 4, the self attention feature vector a and the pre-attention vector V are subjected to vector multiplication operation to obtain a fused feature vector denoted as H:

H＝[A ₁ v ₁ ，A ₂ v ₂ ，…，A _m v _m ，A _m+1 t ₁ ，A _m+2 t ₂ ，…，A _m+n t _n ]#(2)

step 4.6: as shown by reference numeral S46 in fig. 4, the fused feature vector H is input into the full-link layer and mapped to each semantic tag, then the score of each semantic tag is predicted by using the existing Softmax activation function, and the tag with the highest probability score is selected as the semantic tag of the cell image to be output. The Softmax function is used in the multi-classification process, and can map the outputs of a plurality of neurons to a (0,1) interval, and the formula of the Softmax function is defined as:

wherein h is _i Representing the current ith semanticThe mapping of labels, j, represents the number of all semantic labels. Specifically, in the present embodiment, the semantic labels of the cells in the title bar are divided into: drawing unit, project, drawing name, personnel function, name, signature, date, drawing number and attribute.

5. As shown in fig. 5, according to the layout information of the title bar, the content of each cell and the semantic label of the cell obtained in step 4, the data in the table are organized in a key-value pair manner, and the structured information of the title bar is generated and recorded in the database as a drawing index.

Those skilled in the art will readily appreciate that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and for convenience and simplicity of description, the foregoing functional units are merely illustrated, and in practical applications, the foregoing functions may be divided into different modules according to requirements. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A road engineering drawing title bar information extraction method is characterized by comprising the following steps:

And step 3: sequentially reading in the cell image T obtained in step 2 _i Extracting the text or character information in the cell by using a text recognition engine to obtain cell text data W _i ；

2. The method for extracting the title bar information of the road engineering drawing as claimed in claim 1, wherein: the specific steps of the step 1 are as follows:

step 1.2: inputting the annotation drawing data set in the step 1.1 into an image target detection model for training, and learning the image characteristics of the title bar, wherein the image characteristics comprise line segment positions, frame positions and line segment shapes;

step 1.3: and (3) after the model training in the step (1.2) is finished, inputting the drawing to be extracted into the model, outputting the title bar coordinates of the drawing, and cutting the title bar into an independent image file I according to the coordinates.

3. The method for extracting the title bar information of the road engineering drawing as claimed in claim 1, wherein: the specific steps of step 2 are as follows:

step 2.1: preprocessing the title bar image I obtained in the step 1, namely performing binarization processing on the title bar image I to obtain a binarization image of black characters on a white background; the binarization processing of the image is to make the whole image show obvious black and white effect by making the gray value of each pixel point in the image tend to 0 or 255, so as to reduce noise interference;

step 2.2: performing transverse expansion and longitudinal expansion on the image subjected to the binarization processing to respectively obtain a transverse line image A and a vertical line image B in the image; the horizontal expansion operation refers to horizontal expansion of white color blocks in the image, so that texts and vertical lines are wiped off to obtain horizontal lines; the longitudinal expansion operation refers to longitudinally expanding white color blocks in the image, so that texts and horizontal lines are wiped off to obtain vertical lines;

step 2.3: combining the horizontal line image A and the vertical line image B obtained in the step 2.2, reserving intersection points, calculating intersection point coordinates in the cell, and respectively storing x-axis coordinates and y-axis coordinates of all intersection points;

step 2.4: according to the intersection point coordinates obtained in the step 2.3, sequentially segmenting the cell image T from the title bar image I according to the distance and connectivity between the intersection points _i (ii) a Wherein i is the number of the cells in the title bar, the connectivity between the intersection points refers to the positions of four vertexes forming the cells, namely, the upper left, the upper right, the lower right and the lower left, and the distance between two adjacent intersection points is greater than a set threshold value and is regarded as two points on the same cell line segment.

4. The method for extracting the title bar information of the road engineering drawing as claimed in claim 1, wherein: the specific steps of the step 3 are as follows:

sequentially acquiring cell images T by using text recognition engine _i The text data in (1) is identified and the identified text data is preprocessed to obtain cell text data W _i (ii) a The preprocessing operation comprises the steps of removing special symbols in the text and filtering the text with recognition confidence coefficient lower than 60% in the data.

5. The method for extracting the title bar information of the road engineering drawing as claimed in claim 1, wherein: the specific steps of the step 4 are as follows:

V＝[v ₁ ，v ₂ ，…，v _m ，t ₁ ，t ₂ ，…，t _n ]#(1)

the length of each vector V is m + n, and the length m and n of each embedded vector are obtained by corresponding feature extraction models;

step 4.4: obtaining a self-attention feature of a pre-attention vector V through an existing self-attention mechanism and recording the self-attention feature as A; calculating weight coefficients of the visual features and the text features to solve the mutual influence among different modes;

step 4.6: inputting the fused feature vector H into a full-link layer to be mapped to each semantic label, then predicting the probability value of each semantic label by using the existing Softmax activation function, and selecting the label with the highest probability score as the semantic label of the cell image to be output; the Softmax function is used in the multi-classification process, and can map the outputs of a plurality of neurons to a (0,1) interval, and the formula of the Softmax function is defined as:

6. The method for extracting the title bar information of the road engineering drawing as claimed in claim 1, wherein: the specific steps of the step 5 are as follows:

organizing the data in the table in a key value pair mode according to the layout information of the title bar, the content of each cell and the semantic label of the cell obtained in the step 4, and generating structured information of the title bar to be used as a drawing index to be recorded into a database.