CN116935418A - Automatic three-dimensional graphic template reorganization method, device and system - Google Patents

Automatic three-dimensional graphic template reorganization method, device and system Download PDF

Info

Publication number
CN116935418A
CN116935418A CN202311188895.1A CN202311188895A CN116935418A CN 116935418 A CN116935418 A CN 116935418A CN 202311188895 A CN202311188895 A CN 202311188895A CN 116935418 A CN116935418 A CN 116935418A
Authority
CN
China
Prior art keywords
image
template
text
automatic
dimensional graphic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311188895.1A
Other languages
Chinese (zh)
Other versions
CN116935418B (en
Inventor
陈尧森
韩兴
温序铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sobey Digital Technology Co Ltd
Original Assignee
Chengdu Sobey Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sobey Digital Technology Co Ltd filed Critical Chengdu Sobey Digital Technology Co Ltd
Priority to CN202311188895.1A priority Critical patent/CN116935418B/en
Publication of CN116935418A publication Critical patent/CN116935418A/en
Application granted granted Critical
Publication of CN116935418B publication Critical patent/CN116935418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19093Proximity measures, i.e. similarity or distance measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a three-dimensional image-text template automatic recombination method, equipment and a system, which comprise the following steps: s1, acquiring an image-text template data set containing each category from a three-dimensional image-text template library; s2, fine tuning the pre-trained CLIP model through an image-text template data set; s3, image region segmentation is carried out on the input image; s4, inputting the segmented image area and the graphic template data set into a fine-tuned CLIP model to obtain an image area meeting the condition and a corresponding graphic template type; s5, outputting the position of the image area and the corresponding image-text template type; s6, acquiring image area control parameters; s7, finishing the image-text template reorganization of the input image according to the control parameters, the image area position and the corresponding image-text template types. The application realizes automatic image-text template recombination generation and provides higher efficiency, accuracy and flexibility for the fields of image-text display, report generation and the like.

Description

Automatic three-dimensional graphic template reorganization method, device and system
Technical Field
The application relates to the technical field of computer vision and deep learning, in particular to an automatic three-dimensional image-text template reorganization method, equipment and a system.
Background
With the continued development of deep learning techniques, they play an important role in computer vision tasks. Advanced semantic features in image and text data can be automatically learned and extracted by a deep learning method. The features can better capture the relevance between the image and the text, thereby realizing more accurate and efficient recognition of the image-text template. For example, in terms of image processing, deep learning may be used for tasks such as object detection, image segmentation, and image generation. In terms of text processing, deep learning may be used for text classification, named entity recognition, and semantic understanding tasks.
By combining the detection technology, the segmentation technology and the OCR technology in the deep learning, the interested image area can be accurately extracted from the input image, accurate information is provided for classifying and identifying the image-text templates, and key text information in the image is identified. The development of the technologies enables the automatic reorganization method of the three-dimensional image-text templates to be better suitable for various scenes and complex image contents, and provides powerful support for automatic image analysis and application.
Disclosure of Invention
Aiming at the problems existing in the prior art, the automatic three-dimensional image-text template reorganization method, equipment and system are provided, automatic image-text template generation reorganization is realized, and higher efficiency, accuracy and flexibility are provided for the fields of image-text display, report generation and the like.
The first aspect of the application provides an automatic three-dimensional graphic template reorganization method, which comprises the following steps:
s1, acquiring an image-text template data set containing each category from a three-dimensional image-text template library;
s2, fine tuning the pre-trained CLIP model through an image-text template data set;
s3, image region segmentation is carried out on the input image;
s4, inputting the segmented image area and the graphic template data set into a fine-tuned CLIP model to obtain an image area meeting the condition and a corresponding graphic template type;
s5, outputting the position of the image area and the corresponding image-text template type;
s6, acquiring image area control parameters;
and S7, finishing the image-text template recombination of the input image according to the control parameters, the image area position and the corresponding image-text template types.
In a preferred embodiment, in step S1, the teletext data set consists of image text pairs formed from images and corresponding teletext information of the teletext category.
As a preferred embodiment, the fine tuning process in step S2 includes: the teletext template dataset is entered into the pre-trained CLIP model, causing it to capture semantic associations between images and its categories.
In a preferred embodiment, in step S3, all objects of the input image are segmented by regions using the SAM model, all segmented image regions are clipped on the smallest circumscribed rectangular frame, and all segmented image regions are stored.
As a preferred solution, the specific substeps of step S4 include:
s41, inputting the segmented image area and text information of all image-text template categories into a fine-tuned CLIP model;
step S42, after the CPLI model codes the images and the texts, cosine similarity calculation is carried out on the image and the text coding results one by one, and the similarity score of each image-text template is obtained;
and step S43, saving the image region positions with similarity scores higher than the threshold value and the image template categories thereof.
In a preferred scheme, in step S5, a rectangular frame is used to select a corresponding image area according to the obtained area position and the category of the graphic template, and the corresponding category of the graphic template is output on the rectangular frame to complete visualization of the classification result.
As a preferred solution, the specific substeps of step S6 include:
step S61, preprocessing an input image;
step S62, positioning the text and digital areas in the image;
step S63, performing OCR (optical character recognition) on the text and number areas;
step S64, post-processing and correcting the recognized characters and numbers;
step S65, obtaining key words and digital information in the input image as control parameters of the graphics context.
As a preferred solution, the specific substeps of step S7 include:
step S71, generating a chart of a corresponding type in a three-dimensional image-text template library at a corresponding position according to the acquired image region position and the image-text template type;
and S72, reorganizing and customizing the generated chart according to the control parameters to generate a new image-text template.
The second aspect of the present application provides an automatic three-dimensional graphic template reorganizing device, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the automatic three-dimensional graphic template reorganizing method is executed when the computer program is loaded by the processor.
The third aspect of the present application provides an automatic three-dimensional graphic template reorganization system, comprising the automatic three-dimensional graphic template reorganization device
Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:
and (3) automatic image-text template recombination generation: by using the technologies of image-text template recognition, OCR recognition, image segmentation and the like, the application can automatically extract key text, figures and image areas from an input image and recombine the key text, figures and image areas with control parameters to generate a new image-text template. The workload of manually creating the image-text templates is greatly reduced, and the generation efficiency and consistency are improved.
Accuracy and reliability: by applying computer vision and deep learning techniques, the application can realize accurate image segmentation and OCR recognition, thereby providing accurate region and character recognition results. This ensures that the generated teletext templates are consistent with the original image content and that the key information is correct and reliable.
Flexibility and personalization: the reorganization step of the application matches and combines the control parameters with the segmented areas, so that the generated new graph Wen Moban has higher flexibility and individuation. The graphic templates of various styles, styles and formats can be customized and generated according to different control parameters, so that the personalized requirements of users are met.
Time and cost savings: because of the automatic image-text template recombination generation, the application saves the time and cost for manually creating and designing the image-text template. The user does not need to manually process and edit the image area and the text content, so that the working efficiency is greatly improved, and the related cost is reduced.
Extensibility and adaptability: the application is based on computer vision and deep learning techniques, which have high expandability and adaptability. With further development and improvement of the technology, the performance and effect of the identification and recombination of the image-text templates can be further improved by means of updating and optimizing models, adding training data and the like.
Drawings
Fig. 1 is a flow chart of an automatic reorganization method of a three-dimensional graphic template according to the present application.
FIG. 2 is a diagram illustrating a CLIP model tuning according to an embodiment of the present application.
FIG. 3 is a block diagram of an image segmentation and recognition process according to an embodiment of the present application.
Fig. 4 (a) -4 (c) are diagrams of visual results of recognition of the graphic template in the embodiment of the application.
FIG. 5 is a block diagram of a three-dimensional graphic template reorganization in accordance with an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar modules or modules having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application. On the contrary, the embodiments of the application include all alternatives, modifications and equivalents as may be included within the spirit and scope of the appended claims.
In order to realize automatic image-text template generation and recombination, the embodiment provides a three-dimensional image-text template automatic recombination method which can automatically extract key characters, numbers and image areas from an image and recombine the key characters, numbers and image areas with control parameters to generate a new image-text template. The specific scheme is as follows:
referring to fig. 1, the automatic three-dimensional graphic template reorganization method includes:
s1, preparing a data set.
Before reorganization, a set of teletext template data containing the respective category needs to be prepared to support the use of subsequent steps, the data being pairs of images and image text consisting of the corresponding teletext template category. Wherein the data set is diversified and representative, and needs to cover various types and scenes of the graphic templates.
In this embodiment, a plurality of patterns of graphic templates are selected from a three-dimensional graphic template library to make a column diagram, a pie diagram, a map, and a line diagram of 2000 images in total, wherein 538 column diagrams, 500 pie diagrams, 484 map and 478 line diagrams are used as the fine tuning data set. And taking the category corresponding to each picture as the title, and carrying out one-to-one correspondence on the title and the storage path in the CSV file to obtain the text pairs of the image template image, thereby completing the manufacture of the data set.
S2, fine tuning of the CLIP model.
Referring to fig. 2, the pre-trained CLIP (Contrastive Language-Image Pretraining) model is trimmed by using the fabricated data set, so that the image features of the graphic template and the semantic information of the category are effectively matched, and accurate classification of the graphic template is realized.
Wherein the CLIP model is an image text matching deep learning model based on contrast learning, which can align an image and a text representation space, and is composed of an image encoder and a text encoder. During the countermeasure training process, the image encoder attempts to minimize the distance between the image and the text, while the text encoder attempts to maximize the distance between them. This training helps to learn better image and text representations for the model, making the model excellent in handling various visual and linguistic tasks. Particularly, the method is excellent in various image classification tasks, so that the method is used as a classification model of a three-dimensional graphic template automatic recombination task in the embodiment.
The pre-training CLIP model has good performance in identifying common objects through training of 4 hundred million image text pairs, but has lower identification accuracy in a specific three-dimensional graphic template automatic reorganization task. In the embodiment, 2000 text sheets of image text of the image template are manufactured to finely tune the pre-trained CLIP model of the data set, semantic association between the image template image and the category of the image template image is captured, and the accuracy of automatic recombination of the three-dimensional image template is improved.
S3, image segmentation.
Referring to fig. 3, this step is directed to the image to be processed, i.e., the input image. By segmenting the input image, the object in the image is finely segmented according to the region, and an accurate image region of the image template is provided.
Specifically, in this embodiment, the segmentation is completed by using the image segmentation algorithm SAM. The SAM model comprises an image encoder, a hint encoder, and a lightweight mask decoder.
An image encoder: a pre-trained Vision Transformer (ViT) is used to minimally adapt to process high resolution inputs.
A hint encoder: sparse (dot, box) and dense (mask) cues are used. Points and boxes are represented by position codes that are added to the learning embeddings of each hint type. Dense cues (masks) are embedded using convolution and added to the image embedded elements.
Mask decoder: the mask decoder partitions the mask based on the embedded prediction from the image and hint encoder. It maps image embedding, hint embedding, and output markup to a mask. All embeddings are updated by the decoder block, which uses hints for self-attention and cross-attention in both directions, from hints to image embedding and back.
After all objects in the input image are segmented according to the regions through the SAM model, all segmented regions are cut in the minimum circumscribed rectangular frame of the segmented regions, and finally the segmented image regions are saved.
S4, identifying the image-text template.
And inputting the segmented image area and all collected text information of the graphic template categories into the CLIP model after fine adjustment. And the CLIP model judges the similarity of the image and the text information to obtain the image region position meeting the condition and the corresponding image-text template category. The method comprises the following specific steps:
s41, inputting the segmented image area and text information of all image-text template categories into the fine-tuned CLIP model.
S42, the CLIP model inputs the image and text information of the image and text template categories into an image encoder and a text encoder for encoding, and cosine similarity calculation is carried out on the image and text encoding results one by one to obtain similarity scores of each image and text template.
S43, comparing the similarity score of the image template of each image area with a set confidence threshold value, and storing the image area position higher than the threshold value and the corresponding image template category.
S5, outputting the classification result in a visualized mode.
After the image area position of the input image and the corresponding image-text template category are acquired through the CLIP model, visual output is needed.
Specifically, please refer to fig. 4 (a) -4 (c) for a visual output of classification results of different input images, including a histogram, a pie chart and a line chart, i.e. an image area position higher than a threshold value is obtained, a rectangular frame is used for frame selection, and a corresponding image template category is output at the upper left corner of the rectangular frame.
S6, controlling parameter acquisition.
Referring to fig. 5, in order to make the recombined graphic template coincide with the original image, control parameters such as color, number, text, number, etc. in the original image need to be acquired. The method comprises the following specific steps:
s61, preprocessing operations such as denoising and image enhancement are performed on the input image.
S62, using a text detection model based on deep learning, locating the text and number areas in the image, namely determining the areas possibly containing key text and number information.
S63, OCR recognition is carried out on the character and number areas by using an OCR model based on a convolutional neural network, and the character and number areas are converted into characters and numbers readable by a computer.
S64, normalization is carried out on the recognized text and number results, and post-processing and correction of erroneously recognized characters and the like are removed.
S65, using the obtained key text and digital information as control parameters of the graphic template.
By applying computer vision and deep learning techniques, accurate image segmentation and OCR recognition can be achieved, thereby providing accurate region and text recognition results. This ensures that the generated teletext templates are consistent with the original image content and that the key information is correct and reliable.
S7, recombining the graphic template.
And (3) recombining the control parameters, the image region positions and the corresponding image-text template types in a three-dimensional image-text template library to generate a new image-text template. Specific:
s71, generating a chart of a corresponding type in the three-dimensional image-text template library at a corresponding position according to the image region position and the corresponding image-text template type.
S72, reorganizing and customizing the generated chart according to the control parameters to generate a new image-text template. The new image-text template has higher flexibility and individuation, and can meet image-text display and report generation of different requirements.
The application can automatically extract key words, numbers and image areas from the input image by using technologies such as image-text template recognition, OCR recognition, image segmentation and the like, and recombine the key words, numbers and image areas with control parameters to generate a new image-text template. The workload of manually creating the image-text templates is greatly reduced, and the generation efficiency and consistency are improved.
In the practical application process, the application also provides a three-dimensional graphic template automatic reorganization device, which comprises a processor and a memory, wherein the memory stores a computer program, and when the computer program is loaded by the processor, the three-dimensional graphic template automatic reorganization method is executed.
In the practical application process, the application also provides an automatic three-dimensional graphic template reorganization system which comprises the automatic three-dimensional graphic template reorganization equipment.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the automatic three-dimensional graphic template reorganization method described above.
It should be noted that, in the description of the embodiments of the present application, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present application will be understood in detail by those skilled in the art; the accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (10)

1. An automatic reorganization method of a three-dimensional graphic template is characterized by comprising the following steps:
s1, acquiring an image-text template data set containing each category from a three-dimensional image-text template library;
s2, fine tuning the pre-trained CLIP model through an image-text template data set;
s3, image region segmentation is carried out on the input image;
s4, inputting the segmented image area and the graphic template data set into a fine-tuned CLIP model to obtain an image area meeting the condition and a corresponding graphic template type;
s5, outputting the position of the image area and the corresponding image-text template type;
s6, acquiring image area control parameters;
and S7, finishing the image-text template recombination of the input image according to the control parameters, the image area position and the corresponding image-text template types.
2. The automatic reorganization method of three-dimensional graphic templates according to claim 1, wherein in the step S1, the graphic template data set is composed of image text pairs formed by images and corresponding graphic template category text information.
3. The automatic reorganization method of three-dimensional graphic templates according to claim 1 or 2, wherein the fine tuning process in step S2 includes: the teletext template dataset is entered into the pre-trained CLIP model, causing it to capture semantic associations between images and its categories.
4. The automatic reorganization method of three-dimensional graphic template according to claim 1, wherein in the step S3, all objects of the input image are segmented by regions by adopting a SAM model, all segmented image regions are cut on a minimum circumscribed rectangular frame thereof, and all segmented image regions are stored.
5. The automatic reorganization method of three-dimensional graphic templates according to claim 1, wherein the specific steps of step S4 include:
s41, inputting the segmented image area and text information of all image-text template categories into a fine-tuned CLIP model;
step S42, after the CPLI model codes the images and the texts, cosine similarity calculation is carried out on the image and the text coding results one by one, and the similarity score of each image-text template is obtained;
and step S43, saving the image region positions with similarity scores higher than the threshold value and the image template categories thereof.
6. The automatic reorganization method of three-dimensional graphic templates according to claim 1, wherein in step S5, a rectangular frame is adopted to select a corresponding image area according to the obtained area position and the graphic template category, and a corresponding graphic template category is output on the rectangular frame to complete visualization of the classification result.
7. The automatic reorganization method of three-dimensional graphic templates according to claim 1, wherein the specific steps of step S6 include:
step S61, preprocessing an input image;
step S62, positioning the text and digital areas in the image;
step S63, performing OCR (optical character recognition) on the text and number areas;
step S64, post-processing and correcting the recognized characters and numbers;
step S65, obtaining key words and digital information in the input image as control parameters of the graphics context.
8. The automatic reorganization method of three-dimensional graphic templates according to claim 1, wherein the specific steps of step S7 include:
step S71, generating a chart of a corresponding type in a three-dimensional image-text template library at a corresponding position according to the acquired image region position and the image-text template type;
and S72, reorganizing and customizing the generated chart according to the control parameters to generate a new image-text template.
9. An automatic three-dimensional graphic template reorganization device, comprising a processor and a memory, wherein the memory stores a computer program, and when the computer program is loaded by the processor, the automatic three-dimensional graphic template reorganization method of any one of claims 1-8 is executed.
10. An automatic three-dimensional graphic template reorganization system, comprising the automatic three-dimensional graphic template reorganization device according to claim 9.
CN202311188895.1A 2023-09-15 2023-09-15 Automatic three-dimensional graphic template reorganization method, device and system Active CN116935418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311188895.1A CN116935418B (en) 2023-09-15 2023-09-15 Automatic three-dimensional graphic template reorganization method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311188895.1A CN116935418B (en) 2023-09-15 2023-09-15 Automatic three-dimensional graphic template reorganization method, device and system

Publications (2)

Publication Number Publication Date
CN116935418A true CN116935418A (en) 2023-10-24
CN116935418B CN116935418B (en) 2023-12-05

Family

ID=88390893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311188895.1A Active CN116935418B (en) 2023-09-15 2023-09-15 Automatic three-dimensional graphic template reorganization method, device and system

Country Status (1)

Country Link
CN (1) CN116935418B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152646A (en) * 2023-10-27 2023-12-01 武汉大学 Unmanned electric power inspection AI light-weight large model method and system
CN117671688A (en) * 2023-12-07 2024-03-08 北京智源人工智能研究院 Segmentation recognition and text description method and system based on hintable segmentation model

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100119157A1 (en) * 2007-07-20 2010-05-13 Fujifilm Corporation Image processing apparatus, image processing method and computer readable medium
CN105007539A (en) * 2015-07-17 2015-10-28 孙巍 HTML template-based method, equipment and system for releasing graphics and text information via television
CN113485160A (en) * 2021-07-26 2021-10-08 中国核电工程有限公司 Simulation modeling method and device based on pattern matching recognition
CN113516142A (en) * 2020-11-26 2021-10-19 腾讯科技(深圳)有限公司 Text image matching method, device, equipment and storage medium
US20210374455A1 (en) * 2020-05-29 2021-12-02 Accenture Global Solutions Limited Utilizing machine learning and image filtering techniques to detect and analyze handwritten text
CN114005123A (en) * 2021-10-11 2022-02-01 北京大学 System and method for digitally reconstructing layout of print form text
CN114187165A (en) * 2021-11-09 2022-03-15 阿里巴巴云计算(北京)有限公司 Image processing method and device
CN114565927A (en) * 2022-03-03 2022-05-31 上海恒生聚源数据服务有限公司 Table identification method and device, electronic equipment and storage medium
CN115294150A (en) * 2022-06-22 2022-11-04 华为技术有限公司 Image processing method and terminal equipment
CN115359323A (en) * 2022-08-31 2022-11-18 北京百度网讯科技有限公司 Image text information generation method and deep learning model training method
US20230196716A1 (en) * 2022-03-02 2023-06-22 Beijing Baidu Netcom Science Technology Co., Ltd. Training multi-target image-text matching model and image-text retrieval
CN116304307A (en) * 2023-02-10 2023-06-23 武汉理工大学 Graph-text cross-modal retrieval network training method, application method and electronic equipment
CN116452410A (en) * 2023-03-10 2023-07-18 浙江工业大学 Text-guided maskless image editing method based on deep learning
WO2023134073A1 (en) * 2022-01-11 2023-07-20 平安科技(深圳)有限公司 Artificial intelligence-based image description generation method and apparatus, device, and medium
CN116701637A (en) * 2023-06-29 2023-09-05 中南大学 Zero sample text classification method, system and medium based on CLIP
CN116721419A (en) * 2023-06-26 2023-09-08 戈迪斯(杭州)智能技术有限公司 Auxiliary labeling method combined with SAM (self-contained imaging) of visual large model

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100119157A1 (en) * 2007-07-20 2010-05-13 Fujifilm Corporation Image processing apparatus, image processing method and computer readable medium
CN105007539A (en) * 2015-07-17 2015-10-28 孙巍 HTML template-based method, equipment and system for releasing graphics and text information via television
US20210374455A1 (en) * 2020-05-29 2021-12-02 Accenture Global Solutions Limited Utilizing machine learning and image filtering techniques to detect and analyze handwritten text
CN113516142A (en) * 2020-11-26 2021-10-19 腾讯科技(深圳)有限公司 Text image matching method, device, equipment and storage medium
CN113485160A (en) * 2021-07-26 2021-10-08 中国核电工程有限公司 Simulation modeling method and device based on pattern matching recognition
CN114005123A (en) * 2021-10-11 2022-02-01 北京大学 System and method for digitally reconstructing layout of print form text
CN114187165A (en) * 2021-11-09 2022-03-15 阿里巴巴云计算(北京)有限公司 Image processing method and device
WO2023134073A1 (en) * 2022-01-11 2023-07-20 平安科技(深圳)有限公司 Artificial intelligence-based image description generation method and apparatus, device, and medium
US20230196716A1 (en) * 2022-03-02 2023-06-22 Beijing Baidu Netcom Science Technology Co., Ltd. Training multi-target image-text matching model and image-text retrieval
CN114565927A (en) * 2022-03-03 2022-05-31 上海恒生聚源数据服务有限公司 Table identification method and device, electronic equipment and storage medium
CN115294150A (en) * 2022-06-22 2022-11-04 华为技术有限公司 Image processing method and terminal equipment
CN115359323A (en) * 2022-08-31 2022-11-18 北京百度网讯科技有限公司 Image text information generation method and deep learning model training method
CN116304307A (en) * 2023-02-10 2023-06-23 武汉理工大学 Graph-text cross-modal retrieval network training method, application method and electronic equipment
CN116452410A (en) * 2023-03-10 2023-07-18 浙江工业大学 Text-guided maskless image editing method based on deep learning
CN116721419A (en) * 2023-06-26 2023-09-08 戈迪斯(杭州)智能技术有限公司 Auxiliary labeling method combined with SAM (self-contained imaging) of visual large model
CN116701637A (en) * 2023-06-29 2023-09-05 中南大学 Zero sample text classification method, system and medium based on CLIP

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIAO ZHOU等: "A hybrid approach to detecting technological recombination based on text mining and patent network analysis", 《SCIENTOMETRICS》, vol. 121, pages 699 - 737 *
毛宇兆: "基于深度学习的图像描述自动生成研究", 《中国博士学位论文全文数据库 信息科技辑》, no. 1, pages 138 - 232 *
邓显奕: "多模态图文译文生成模式的构建", 《上海翻译》, no. 3, pages 30 - 37 *
高欣等: "基于演化深度学习的图像描述自动生成技术研究", 《计算机应用研究》, vol. 39, no. 3, pages 911 - 918 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152646A (en) * 2023-10-27 2023-12-01 武汉大学 Unmanned electric power inspection AI light-weight large model method and system
CN117152646B (en) * 2023-10-27 2024-02-06 武汉大学 Unmanned electric power inspection AI light-weight large model method and system
CN117671688A (en) * 2023-12-07 2024-03-08 北京智源人工智能研究院 Segmentation recognition and text description method and system based on hintable segmentation model

Also Published As

Publication number Publication date
CN116935418B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN116935418B (en) Automatic three-dimensional graphic template reorganization method, device and system
CN106980856B (en) Formula identification method and system and symbolic reasoning calculation method and system
CN112232149A (en) Document multi-mode information and relation extraction method and system
CN111737511B (en) Image description method based on self-adaptive local concept embedding
CN112541927A (en) Method, device, equipment and storage medium for training and matting model
CN114596566B (en) Text recognition method and related device
CN112115879B (en) Self-supervision pedestrian re-identification method and system with shielding sensitivity
CN111553350A (en) Attention mechanism text recognition method based on deep learning
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113762269A (en) Chinese character OCR recognition method, system, medium and application based on neural network
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
CN112750071B (en) User-defined expression making method and system
CN112966676B (en) Document key information extraction method based on zero sample learning
CN112200216A (en) Chinese character recognition method, device, computer equipment and storage medium
CN113361530A (en) Image semantic accurate segmentation and optimization method using interaction means
CN115130437B (en) Intelligent document filling method and device and storage medium
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN117115824A (en) Visual text detection method based on stroke region segmentation strategy
US20220398399A1 (en) Optical character recognition systems and methods for personal data extraction
CN114241202A (en) Method and device for training dressing classification model and method and device for dressing classification
CN114155540A (en) Character recognition method, device and equipment based on deep learning and storage medium
CN114241495B (en) Data enhancement method for off-line handwritten text recognition
CN115223171B (en) Text recognition method, device, equipment and storage medium
CN116543389B (en) Character recognition method, device, equipment and medium based on relational network
CN117830537B (en) Weak supervision 3D scene graph generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant