WO2023178798A1 - 图像分类方法、装置、设备及介质 - Google Patents

图像分类方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023178798A1
WO2023178798A1 PCT/CN2022/090437 CN2022090437W WO2023178798A1 WO 2023178798 A1 WO2023178798 A1 WO 2023178798A1 CN 2022090437 W CN2022090437 W CN 2022090437W WO 2023178798 A1 WO2023178798 A1 WO 2023178798A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
text
features
classified
segmentation
Prior art date
Application number
PCT/CN2022/090437
Other languages
English (en)
French (fr)
Inventor
唐小初
张祎頔
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023178798A1 publication Critical patent/WO2023178798A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of intelligent decision-making technology of artificial intelligence, and in particular to an image classification method, device, electronic equipment and computer-readable storage medium.
  • image detection such as image classification has become more and more widely used in daily production or life. For example, searching for similar products based on image recognition, in the transportation industry, through crawling and Analyze driving images and automatically identify illegal driving, etc.
  • a single machine learning model has limited image feature representation capabilities and cannot analyze and learn images from multiple aspects.
  • a single machine learning model cannot well combine the advantages of multiple machine learning models with different characteristics, resulting in a single machine learning model. The accuracy of image classification needs to be improved.
  • An image classification method provided by this application includes:
  • image classification analysis is performed on the image to be classified according to the fusion feature and the probability value to obtain a classification result of the image to be classified.
  • This application also provides an image classification device, which includes:
  • a feature extraction module used to obtain the image to be classified, extract the image features of the image to be classified, identify the text content in the image to be classified, and extract the text features of the text content
  • a feature fusion module used to fuse the image features and the text features to obtain fusion features
  • a classification analysis module configured to use a pre-trained activation function to calculate the probability value between the fusion feature and a plurality of preset classification labels, and use a pre-trained integrated classification model to calculate the probability value based on the fusion feature and the probability value.
  • the image to be classified is subjected to image classification analysis to obtain a classification result of the image to be classified.
  • This application also provides an electronic device, which includes:
  • the present application also provides a computer-readable storage medium in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the image classification method as described below. :
  • image classification analysis is performed on the image to be classified according to the fusion feature and the probability value to obtain a classification result of the image to be classified.
  • the embodiment of the present invention uses the fusion feature after fusing image features and text features and the classification probability value corresponding to the fusion feature as the input of the pre-trained integrated classification model.
  • multi-modal fusion features are better than single Modal features have more comprehensive features and higher information value, which can improve the accuracy of image classification.
  • using the classification probability value corresponding to the fusion feature as one of the inputs can improve the accuracy of the pre-trained integrated classification model. Learning efficiency.
  • using the pre-trained integrated classification model can effectively combine the advantages of machine learning models with different features to improve the accuracy of image classification.
  • Figure 1 is a schematic flowchart of an image classification method provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a detailed implementation flow of one step in the image classification method provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of a detailed implementation flow of one step in the image classification method provided by an embodiment of the present application.
  • Figure 4 is a functional module diagram of an image classification device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device implementing the image classification method provided by an embodiment of the present application.
  • the embodiment of the present application provides an image classification method.
  • the execution subject of the image classification method includes, but is not limited to, at least one of a server, a terminal, and other electronic devices that can be configured to execute the method provided by the embodiments of the present application.
  • the image classification method can be executed by software or hardware installed on the terminal device or the server device, and the software can be a blockchain platform.
  • the server may be an independent server, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and content delivery networks (ContentDeliveryNetwork , CDN), as well as cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • Content DeliveryNetwork CDN
  • FIG. 1 is a schematic flowchart of an image classification method provided by an embodiment of the present application.
  • the image classification method includes:
  • the image classification method is explained by taking the classification of products by color based on product images as an example.
  • the images to be classified may be a preset number of product images.
  • the image features of the image to be classified include but are not limited to product outline feature data and product color feature data in the image.
  • a pre-constructed neural network can be used to extract the image features of the image to be classified.
  • the S1 includes:
  • a preset normalization formula can be used to perform a normalization operation on the pixel value of each pixel point in each of the images to be classified, so that the pixel value of each pixel point in the image to be classified is
  • the pixel values are mapped to a preset value range to normalize the color space of the image to be classified to obtain a standard image.
  • the normalization formula can be:
  • Zi is the normalized value of the i-th pixel in the image to be classified
  • xi is the pixel value of the i-th pixel in the image to be classified
  • max(X) is the maximum value in the image to be classified.
  • pixel value, min(X) is the smallest pixel value in the image to be classified.
  • the contrast of the image can be adjusted, and the impact of local shadows and illumination changes on the image features can be reduced, which is beneficial to improving the efficiency of extracting image features. Accuracy.
  • the standard image can be divided into multiple image blocks according to a preset ratio, and the pixel gradient of each pixel in each pixel block can be calculated one by one.
  • the outline of the object in the standard image can be captured. information, while further weakening the interference of lighting and improving the accuracy of image features.
  • a preset gradient algorithm can be used to calculate the pixel gradient of each pixel in each image block.
  • the gradient algorithm includes but is not limited to two-dimensional discrete derivation algorithm, soble operator, etc.
  • Embodiments of the present application can calculate the gradient histogram in each image block based on the pixel gradient, and then use the value of each gradient in the gradient histogram to generate a vector for identifying the gradient histogram, and combine all The vectors of gradient histograms are concatenated into image features of the image to be classified.
  • product display usually uses pictures and text.
  • product-related text description information such as name, specifications, colors, etc. are also provided.
  • a product will contain image features embodied by the product image and textual features provided by the product description information.
  • step S1 since step S1 obtains the image features of the image to be classified, it only performs image analysis on the image to be classified, and does not analyze the text information of the image to be classified. Therefore, in order to improve the classification Regarding the accuracy of image classification, the embodiment of the present application identifies the text content in the image to be classified and analyzes the text content.
  • OCR technology can be used to identify the text content in the image to be classified.
  • using all the word vectors to generate a text vector matrix corresponding to the text content includes: selecting one text segmentation from the plurality of text segmentations one by one as the target segmentation, and counting the target segmentation and all the text segmentations. Describe the number of co-occurrences of adjacent text segments of the target segment within the preset neighborhood range of the target segment; construct a co-occurrence matrix using the number of co-occurrences corresponding to each text segment; splice all the word vectors into Vector matrix; use the co-occurrence matrix and the vector matrix to perform a product operation to obtain a text vector matrix corresponding to the text content.
  • the text content is composed of natural language
  • the text content is directly analyzed, a large amount of computing resources will be occupied, resulting in low efficiency of analysis. Therefore, the text content can be converted into text Vector matrices, thereby converting textual content expressed in natural language into numerical form.
  • a preset standard dictionary can be used to perform word segmentation processing on the text content to obtain multiple text word segments, and the standard dictionary contains multiple standard word segments.
  • the text content is searched in the standard dictionary according to different lengths. If the same standard word segmentation as the text content can be retrieved, it can be determined that the retrieved standard word segmentation is the text of the text content. Participle.
  • the number of co-occurrences corresponding to each text segmentation can be used to construct a co-occurrence matrix as shown below:
  • Xi ,j is the number of co-occurrences of keyword i and adjacent text segment j of keyword i in the text content.
  • models with word vector conversion functions such as word2vec model and NLP (Natural Language Processing) model can be used to respectively convert the multiple text segmentations into word vectors, and then splice the word vectors into the text.
  • the vector matrix of the content is multiplied by the vector matrix and the co-occurrence matrix to obtain a text vector matrix.
  • the text content contains a large number of text segmentations, but not every text segmentation is a feature of the text content, it is necessary to filter the multiple text segmentations.
  • the embodiments of the present application select One text segmentation is selected one by one as the target word segmentation, and the key value of the target word segmentation is calculated according to the word vector of the target word segmentation and the text vector matrix, so as to filter out the content of the text based on the key value. Representative feature word segmentation to obtain the text features of the text content.
  • calculating the key value of the target word segmentation based on the word vector of the target word segmentation and the text vector matrix includes:
  • K is the key value
  • is the text vector matrix
  • T is the matrix transpose symbol
  • is the modulus symbol
  • a preset number of text segmentations are selected from the plurality of text segmentations in descending order according to the key value of each text segmentation as the feature segmentation.
  • the plurality of text segmentation includes: text segmentation A, text segmentation B and text segmentation C.
  • the key value of text segmentation A is 80
  • the key value of text segmentation B is 70
  • the key value of text segmentation C is 30.
  • the preset number is 2, then according to the order of the key values from large to small, select text segmentation A and text segmentation B as the feature segmentation, and carry out the word vectors of the text segmentation A and the text segmentation B. Splicing to obtain the text features of the text content.
  • a pre-built Bert model can be used to extract text features of the text content.
  • feature fusion can be performed before, during and after model training.
  • the fusion of the image features and the text features to obtain the fusion features includes: performing matrix conversion processing on the image features to obtain image features with the same dimensions as the text features; using preset The fully connected layer network associates the text features and the converted image features to obtain fusion features.
  • the dimensions corresponding to the image features and the text features may be different.
  • the dimensions corresponding to the image features and the text features need to be aligned first.
  • the image features can be subjected to matrix transformation processing through the reshape function.
  • the preset fully connected layer network is a convolutional neural network based on deep learning.
  • the following preset fusion function can be used to generate fusion features:
  • F is the fusion feature
  • Q is the converted image feature
  • K is the text feature
  • transpose is the transposition function
  • dot is the matrix multiplication
  • softmax is the activation function
  • dense is the preset fully connected layer
  • the subsequent calculation workload can be reduced, and on the other hand, the effective information of the fused features can be improved. quantity.
  • pre-trained activation functions can be used to calculate the fusion features to calculate the probability value between each feature in the fusion features and a plurality of preset classification labels, where the probability The value refers to the probability value that each feature is a certain classification.
  • the relative probability between a certain feature and a certain classification label is higher, the higher the probability that the feature is used to express the classification label.
  • the activation function includes, but is not limited to, a softmax activation function, a sigmoid activation function, and a relu activation function.
  • the plurality of preset hair-type labels include, but is not limited to, blue, white, yellow, gray, etc.
  • the following activation function can be used to calculate the probability value:
  • x) is the relative probability between the fusion feature x and the classification label a
  • w a is the weight vector of the classification label a
  • T is the transposition operation symbol
  • exp is the expectation operation symbol
  • A The number of preset multiple category labels.
  • S5. Use the pre-trained integrated classification model to perform image classification analysis on the image to be classified according to the fusion feature and the probability value to obtain a classification result of the image to be classified. ;
  • the pre-trained integrated classification model may be a preset number of classifier models built based on the XGBoost (X-GradientBoostingDecisionTree, super gradient boosting tree) integrated learning principle, or may be based on the K-fold voting mechanism A preset number of classifier models are built.
  • XGBoost X-GradientBoostingDecisionTree, super gradient boosting tree
  • the K-fold voting mechanism can be used to perform relevant voting operations based on the classification probability value output by each classifier for each of the images to be classified according to the pre-trained integrated classification. Determine the final classification result of each image to be classified.
  • the pre-trained integrated classification model automatically learns the pre-trained integrated classification using the XGBoost learning principle based on the fusion features of each image to be classified and the probability values corresponding to the fusion features.
  • the weighted probability of each classifier in the model ensures the accuracy of the classification result of the image to be classified.
  • a preset number of classifiers in the pre-trained classification model are used to perform classification analysis on the images to be classified, wherein each classifier outputs a classification probability for each image to be classified. value, the final classification result of each of the images to be classified can be determined based on the XGBoost integrated learning principle, the weights of different classifiers and the classification probability value of each of the classified images.
  • the embodiment of the present application uses the fusion feature after fusing image features and text features and the classification probability value corresponding to the fusion feature as the input of the pre-trained integrated classification model.
  • multi-modal fusion features are better than single Modal features have more comprehensive features and higher information value, which can improve the accuracy of image classification.
  • using the classification probability value corresponding to the fusion feature as one of the inputs can improve the accuracy of the pre-trained integrated classification model. Learning efficiency.
  • using the pre-trained integrated classification model can effectively combine the advantages of machine learning models with different features to improve the accuracy of image classification.
  • FIG. 4 it is a functional module diagram of an image classification device provided by an embodiment of the present application.
  • the image classification device 100 described in this application can be installed in electronic equipment. According to the implemented functions, the image classification device 100 may include a feature extraction module 101, a feature fusion module 102, and a classification analysis module 103.
  • the module described in this application can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete a fixed function, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the feature extraction module 101 is used to obtain the image to be classified, extract the image features of the image to be classified, identify the text content in the image to be classified, and extract the text features of the text content;
  • the feature fusion module 102 is used to fuse the image features and the text features to obtain fusion features;
  • the classification analysis module 103 is configured to use a pre-trained activation function to calculate the probability value between the fusion feature and a plurality of preset classification labels, and use a pre-trained integrated classification model to calculate the probability value based on the fusion feature and the preset classification labels.
  • the probability value is used to perform image classification analysis on the image to be classified, and a classification result of the image to be classified is obtained.
  • each module in the image classification device 100 described in the embodiment of the present application adopts the same technical means as the image classification method described in the above-mentioned Figures 1 to 3 when used, and can produce the same technical effect. I won’t go into details here.
  • FIG. 5 it is a schematic structural diagram of an electronic device for implementing an image classification method provided by an embodiment of the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as an image classification program.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (such as SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 , such as a mobile hard disk of the electronic device 1 .
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart memory card (SmartMediaCard, SMC), or a secure digital (SD) equipped on the electronic device 1. card, flash card (FlashCard), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can not only be used to store application software installed on the electronic device 1 and various types of data, such as the code of an image classification program, but can also be used to temporarily store data that has been output or is to be output.
  • the processor 10 may be composed of an integrated circuit, for example, it may be composed of a single packaged integrated circuit, or it may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more Central processing unit (CPU), microprocessor, digital processing chip, graphics processor and various control chip combinations, etc.
  • the processor 10 is the control core (ControlUnit) of the electronic device, using various interfaces and lines to connect various components of the entire electronic device, by running or executing programs or modules (such as image classification) stored in the memory 11 program, etc.), and call the data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc.
  • the bus is configured to enable connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 5 only shows an electronic device with components. Persons skilled in the art can understand that the structure shown in FIG. 5 does not limit the electronic device 1 and may include fewer or more components than shown in the figure. components, or combinations of certain components, or different arrangements of components.
  • the electronic device 1 may also include a power supply (such as a battery) that supplies power to various components.
  • the power supply may be logically connected to the at least one processor 10 through a power management device, so that through the power management device
  • the device implements functions such as charging management, discharge management, and power consumption management.
  • the power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.
  • the electronic device 1 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be described again here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are usually used in the electronic device. 1. Establish communication connections with other electronic devices.
  • the electronic device 1 may also include a user interface, which may be a display (Display) or an input unit (such as a keyboard).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, or the like.
  • the display may also be appropriately referred to as a display screen or a display unit, and is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the image classification program stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When run in the processor 10, it can realize:
  • image classification analysis is performed on the image to be classified according to the fusion feature and the probability value to obtain a classification result of the image to be classified.
  • the integrated modules/units of the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be volatile or non-volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) ).
  • This application also provides a computer-readable storage medium, which may be volatile or non-volatile.
  • the readable storage medium stores a computer program. When executed by the processor of the electronic device, the computer program can implement:
  • image classification analysis is performed on the image to be classified according to the fusion feature and the probability value to obtain a classification result of the image to be classified.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in various embodiments of the present application can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.
  • Blockchain is a new application model of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain is essentially a decentralized database. It is a series of data blocks generated using cryptographic methods. Each data block contains a batch of network transaction information and is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • Blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • AI artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种图像分类方法及一种图像分类装置,包括:提取待分类图像的图像特征及文本特征(S1,S2),对图像特征及文本特征进行融合,得到融合特征(S3),利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值(S4),利用预先训练的集成分类模型,根据融合特征及所述概率值对所述待分类图像进行图像分类分析,得到待分类图像的分类结果(S5),从而可以提升图像分类的精确性和分类效率。

Description

图像分类方法、装置、设备及介质
本申请要求于2022年03月25日提交中国专利局、申请号为202210299096.0,发明名称为“图像分类方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的智能决策技术领域,尤其涉及一种图像分类方法、装置、电子设备及计算机可读存储介质。
背景技术
随着基于神经网络的机器学习技术的进步,图像分类等图像检测在日常的生产或生活中有着越来越广泛的应用,例如,基于图像识别搜索类似商品,在交通行业中,通过抓取及分析驾驶图像,自动识别违规驾驶等。
发明人意识到,当前的图像分类较多是基于某一图像分类算法,构建相应的机器学习模型,利用所述机器学习模型提取所述图像的图像特征,进而对所述图像特征进行分类分析。但是单一机器学习模型,图像特征表示能力有限,不能从多方面对图像进行分析和学习,同时,单一机器学习模型不能很好的结合不同特性的多个机器学习模型的优势,导致单一机器学习模型的图像分类的精确性有待提升。
发明内容
本申请提供的一种图像分类方法,包括:
获取待分类图像,提取所述待分类图像的图像特征;
识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
对所述图像特征及所述文本特征进行融合,得到融合特征;
利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
本申请还提供一种图像分类装置,所述装置包括:
特征提取模块,用于获取待分类图像,提取所述待分类图像的图像特征,识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
特征融合模块,用于对所述图像特征及所述文本特征进行融合,得到融合特征;
分类分析模块,用于利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值,利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
本申请还提供一种电子设备,所述电子设备包括:
存储器,存储至少一个计算机程序;及
处理器,执行所述存储器中存储的程序以实现如下所述的图像分类方法:
获取待分类图像,提取所述待分类图像的图像特征;
识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
对所述图像特征及所述文本特征进行融合,得到融合特征;
利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进 行图像分类分析,得到所述待分类图像的分类结果。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个计算机程序,所述至少一个计算机程序被电子设备中的处理器执行以实现如下所述的图像分类方法:
获取待分类图像,提取所述待分类图像的图像特征;
识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
对所述图像特征及所述文本特征进行融合,得到融合特征;
利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
本发明实施例利用融合图像特征和文本特征后的融合特征以及所述融合特征对应的分类概率值作为所述预先训练的集成分类模型的输入,一方面,多模态的融合特征相较于单一模态的特征,特征更全面,信息价值更高,可以提升图像分类的精准度,同时,将所述融合特征对应的分类概率值作为输入之一,可以提升所述预先训练的集成分类模型的学习效率。另一方面,利用所述预先训练的集成分类模型可以有效结合不同特征的机器学习模型的优势,提升图像分类的准确性。
附图说明
图1为本申请一实施例提供的图像分类方法的流程示意图;
图2为本申请一实施例提供的图像分类方法中其中一个步骤的详细实施流程示意图;
图3为本申请一实施例提供的图像分类方法中其中一个步骤的详细实施流程示意图;
图4为本申请一实施例提供的图像分类装置的功能模块图;
图5为本申请一实施例提供的实现所述图像分类方法的电子设备的结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供一种图像分类方法。所述图像分类方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述图像分类方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDeliveryNetwork,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。
参照图1所示,为本申请一实施例提供的图像分类方法的流程示意图。在本实施例中,所述图像分类方法包括:
S1、获取待分类图像,提取所述待分类图像的图像特征;
本申请实施例中,以基于产品图像对产品按照颜色进行分类为例,说明所述图像分类方法。其中,所述待分类图像可以是预设数量的产品图像。所述待分类图像的图像特征包括但不限于图像中的产品轮廓特征数据、产品颜色特征数据。
本申请实施例中,可以利用预先构建的神经网络提取所述待分类图像的图像特征。
详细地,参阅图2所示,所述S1,包括:
S11、对所述待分类图像进行色彩空间归一化处理,得到标准图像;
S12、将每张所述标准图像按照预设比例划分为多个图像块,计算每个所述图像块中每个像素的像素梯度,根据所述像素梯度统计得到每个所述图像块的梯度直方图;
S13、将所述梯度直方图转换为向量,并将所有梯度直方图的向量进行拼接,得到所述待分类图像的图像特征。
本申请实施例中,可以利用预设的归一化公式对每张所述待分类图像中每个像素点的像素值进行归一化运算,以将所述待分类图像中每个像素点的像素值映射至预设值域内,实现对所述待分类图像进行色彩空间归一化,得到标准图像。
示例性地,所述归一化公式可以为:
Figure PCTCN2022090437-appb-000001
其中,Z i为所述待分类图像中第i个像素的归一化数值,x i为所述待分类图像中第i个像素的像素值,max(X)为所述待分类图像中最大的像素值,min(X)为所述待分类图像中最小的像素值。
本申请实施例中,通过对所述待分类图像进行色彩空间归一化处理,可调节图像的对比度,降低图像局部的阴影和光照变化对图像特征所造成的影响,有利于提高提取图像特征的精确度。
进一步地,可将所述标准图像按照预设比例划分为多个图像块,并逐一计算每一个像素块中每个像素的像素梯度,通过计算像素梯度,可捕获所述标准图像中物体的轮廓信息,同时进一步弱化光照的干扰,提高图像特征的精确度。
其中,可利用预设的梯度算法计算每一个图像块中每个像素的像素梯度,所述梯度算法包括但不限于二维离散求导算法、soble算子等。
本申请实施例可根据所述像素梯度,统计出每个图像块中的梯度直方图,进而利用所述梯度直方图中各梯度的值,生成用于标识该梯度直方图的向量,并将所有梯度直方图的向量拼接为所述待分类图像的图像特征。
S2、识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
可以理解的是,在实际应用中,产品展示通常采用图片加文字的方式,例如,在商品浏览网页,除展示商品图片信息外,还会提供商品相关的名称、规格、颜色等文字描述信息。这种情况下,一种产品会包含由产品图像体现的图像特征以及由产品描述信息提供的文本特征。
本申请实施例中,由于步骤S1获取的是待分类图像的图像特征,仅是对所述待分类图像进行图像分析,并未对所述待分类图像的文本信息进行分析,因此,为了提高对图像分类的精确度,本申请实施例识别所述待分类图像中的文本内容,并对所述文本内容进行分析。
本申请实施例中,可以利用OCR技术识别所述待分类图像中的文本内容。
详细地,参阅图3所示,所述提取所述文本内容的文本特征
S21、对所述文本内容进行分词,得到多个文本分词;
S22、生成每个所述文本分词对应的词向量,利用所有所述词向量生成所述文本内容对应的文本向量矩阵;
S23、从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,根据所述目标分词的词向量及所述文本向量矩阵,计算所述目标分词的关键值;
S24、按照所述关键值从大到小的顺序从所述多个文本分词中选取预设数量的文本分词为特征分词;
S25、将所述特征分词的词向量进行拼接,得到所述文本内容的文本特征。
详细地,所述利用所有所述词向量生成所述文本内容对应的文本向量矩阵,包括:从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,并统计所述目标分词和所述目标分词的相邻文本分词在所述目标分词的预设邻域范围内共同出现的共现次数;利用 每一个文本分词对应的共现次数构建共现矩阵;将所有所述词向量拼接为向量矩阵;利用所述共现矩阵和所述向量矩阵进行乘积运算,得到所述文本内容对应的文本向量矩阵。
本申请实施例中,由于所述文本内容由自然语言组成,若直接对所述文本内容进行分析,会占用大量的计算资源,导致分析的效率低下,因此,可将所述文本内容转换为文本向量矩阵,进而将由自然语言表达的文本内容转换为数值形式。
详细地,可采用预设的标准词典对所述文本内容进行分词处理,得到多个文本分词,所述标准词典中包含多个标准分词。
例如,将所述文本内容按照不同的长度在所述标准词典中进行检索,若能检索到与所述文本内容相同的标准分词,则可确定检索到的该标准分词为所述文本内容的文本分词。
示例性地,可利用每一个文本分词对应的所述共现次数构建如下所示的共现矩阵:
Figure PCTCN2022090437-appb-000002
其中,X i,j为所述文本内容中关键词i与该关键词i的相邻文本分词j的共现次数。
本申请实施例中,可采用word2vec模型、NLP(NaturalLanguageProcessing,自然语言处理)模型等具有词向量转换功能的模型分别将所述多个文本分词转换为词向量,进而将词向量拼接为所述文本内容的向量矩阵,并将所述向量矩阵与所述共现矩阵进行乘积运算,得到文本向量矩阵。
详细地,由于所述文本内容中包含大量的文本分词,但并非每一个文本分词均是该文本内容的特征,因此,需要对所述多个文本分词进行筛选,本申请实施例从所述多个文本分词中逐个选取其中一个文本分词为目标分词,根据所述目标分词的词向量与所述文本向量矩阵计算所述目标分词的关键值,以根据所述关键值筛选出对该文本内容具有代表性的特征分词,以实现获取该文本内容的文本特征。
具体地,所述根据所述目标分词的词向量与所述文本向量矩阵计算所述目标分词的关键值,包括:
利用如下关键值算法计算所述目标分词的关键值:
Figure PCTCN2022090437-appb-000003
其中,K为所述关键值,|W|为所述文本向量矩阵,T为矩阵转置符号,||为求模符号,
Figure PCTCN2022090437-appb-000004
为所述目标分词的词向量。
本申请实施例中,按照每一个文本分词的关键值从大到小的顺序从所述将所述多个文本分词中选取预设数量的文本分词为特征分词。
例如,所述多个文本分词包括:文本分词A、文本分词B和文本分词C,其中,文本分词A的关键值为80,文本分词B的关键值为70,文本分词C的关键值为30,若预设数量为2,则按照所述关键值从大到小的顺序,选取文本分词A和文本分词B为特征分词,并将所述文本分词A和所述文本分词B的词向量进行拼接,得到所述文本内容的文本特征。
本申请另一实施例中,可以采用预先构建的Bert模型提取所述文本内容的文本特征。
S3、对所述图像特征及所述文本特征进行融合,得到融合特征;
可以理解的是,在深度学习算法中,针对多模态特征,通常需要将多模态特征进行特征融合,所述特征融合可以在模型训练前、训练中以及训练后进行,本申请实施例中,针对所述图像特征及所述文本特征这两种特征,进行特征融合,利用融合特征进行后续的相关模型的训练。
详细地,所述对所述图像特征及所述文本特征进行融合,得到融合特征,包括:对所述图像特征进行矩阵转换处理,得到与所述文本特征相同维度的图像特征;利用预设的全 连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征。
本申请实施例中,所述图像特征及所述文本特征对应的维度可能不同,为了便于将两种特征进行融合计算,需要先将所述图像特征及所述文本特征对应的维度进行对齐。
本申请实施例中,可以通过reshape函数对所述图像特征进行矩阵转换处理。
本申请实施例中,所述预设的全连接层网络是基于深度学习的卷积神经网络。
示例性地,可以利用如下预设的融合函数,生成融合特征:
F=dense(softmax(dot(Q,transpose(K)))*K
F为所述融合特征,Q为所述转换后的图像特征,K为所述文本特征,transpose为转置函数,dot为矩阵乘法,softmax为激活函数,dense为所述预设的全连接层网络的卷积计算算法。
本申请实施例中,通过对所述图像特征及所述文本特征进行融合,利用融合特征进行后续的分析,一方面可以减少后续的计算工作量,另一方面,可以提升融合后特征的有效信息量。
S4、利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
本申请实施例中,可分别利用预先训练的激活函数对所述融合特征进行计算,以计算所述融合特征中每一个特征与预设多个分类标签之间的概率值,其中,所述概率值是指每一个特征是某一种分类的概率值,当某一特征与某一分类标签之间的相对概率越高,则该特征是用于表达该分类标签的概率越高。
详细地,所述激活函数包括但不限于softmax激活函数、sigmoid激活函数、relu激活函数,所述预设的多个发呢类标签包括但不限于蓝色、白色、黄色、灰色等。
本申请其中一个实施例中,可利用如下激活函数计算所述概率值:
Figure PCTCN2022090437-appb-000005
其中,p(a|x)为所述融合特征x和分类标签a之间的相对概率,w a为分类标签a的权重向量,T为求转置运算符号,exp为求期望运算符号,A为预设的多个分类标签的数量。
S5、利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。;
本申请实施例中,所述预先训练的集成分类模型可以是基于XGBoost(X-GradientBoostingDecisionTree,超梯度提升树)集成学习原理构建的预设数量的分类器模型,也可以是基于K-fold投票机制构建的预设数量的分类器模型。
本申请其中一个实施例中,可依据K-fold投票机制,根据所述预先训练的集成分类每个所述分类器针对每张所述待分类图像输出的分类概率值,进行相关的投票操作,确定每张所述待分类图像的最终分类结果。
本申请另一实施例中,所述预先训练的集成分类模型根据每张所述待分类图像的融合特征及所述融合特征对应的概率值,利用XGBoost学习原理自动学习所述预先训练的集成分类模型中每个所述分类器的加权概率,从而保障对所述待分类图像分类结果的准确性。
本申请实施例中,利用所述预先训练的分类模型中预设数量的分类器对所述待分类图像进行分类分析,其中,每个所述分类器针对每张所述待分类图像输出分类概率值,可依据XGBoost集成学习原理,根据不同分类器的权值以及每张所述分类图像的分类概率值,决策每张所述待分类图像的最终分类结果。
本申请实施例利用融合图像特征和文本特征后的融合特征以及所述融合特征对应的分类概率值作为所述预先训练的集成分类模型的输入,一方面,多模态的融合特征相较于 单一模态的特征,特征更全面,信息价值更高,可以提升图像分类的精准度,同时,将所述融合特征对应的分类概率值作为输入之一,可以提升所述预先训练的集成分类模型的学习效率。另一方面,利用所述预先训练的集成分类模型可以有效结合不同特征的机器学习模型的优势,提升图像分类的准确性。
如图4所示,是本申请一实施例提供的图像分类装置的功能模块图。
本申请所述图像分类装置100可以安装于电子设备中。根据实现的功能,所述图像分类装置100可以包括特征提取模块101、特征融合模块102、分类分析模块103。本申请所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
所述特征提取模块101,用于获取待分类图像,提取所述待分类图像的图像特征,识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
所述特征融合模块102,用于对所述图像特征及所述文本特征进行融合,得到融合特征;
所述分类分析模块103,用于利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值,利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
详细地,本申请实施例中所述图像分类装置100中的各个模块在使用时采用与上述的图1至图3中所述的图像分类方法一样的技术手段,并能够产生相同的技术效果,这里不再赘述。
如图5所示,是本申请一实施例提供的实现图像分类方法的电子设备的结构示意图。
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如图像分类程序。
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(SmartMediaCard,SMC)、安全数字(SecureDigital,SD)卡、闪存卡(FlashCard)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如图像分类程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(CentralProcessingunit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(ControlUnit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如图像分类程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheralcomponentinterconnect,简称PCI)总线或扩展工业标准结构(extendedindustrystandardarchitecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。
图5仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图5示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些 部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(OrganicLight-EmittingDiode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的图像分类程序是多个指令的组合,在所述处理器10中运行时,可以实现:
获取待分类图像,提取所述待分类图像的图像特征;
识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
对所述图像特征及所述文本特征进行融合,得到融合特征;
利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。所述计算机可读存储介质可以是易失性的,也可以是非易失性的。例如,所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-OnlyMemory)。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质可以是易失性的,也可以是非易失性的。所述可读存储介质存储有计算机程序,所述计算机程序在被电子设备的处理器所执行时,可以实现:
获取待分类图像,提取所述待分类图像的图像特征;
识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
对所述图像特征及所述文本特征进行融合,得到融合特征;
利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(ArtificialIntelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。***权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种图像分类方法,其中,所述方法包括:
    获取待分类图像,提取所述待分类图像的图像特征;
    识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
    对所述图像特征及所述文本特征进行融合,得到融合特征;
    利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
    利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
  2. 如权利要求1所述的图像分类方法,其中,所述提取所述待分类图像的图像特征,包括:
    对所述待分类图像进行色彩空间归一化处理,得到标准图像;
    将每张所述标准图像按照预设比例划分为多个图像块,计算每个所述图像块中每个像素的像素梯度,根据所述像素梯度统计得到每个所述图像块的梯度直方图;
    将所述梯度直方图转换为向量,并将所有梯度直方图的向量进行拼接,得到所述待分类图像的图像特征。
  3. 如权利要求1所述的图像分类方法,其中,所述提取所述文本内容的文本特征,包括:
    对所述文本内容进行分词,得到多个文本分词;
    生成每个所述文本分词对应的词向量,利用所有所述词向量生成所述文本内容对应的文本向量矩阵;
    从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,根据所述目标分词的词向量及所述文本向量矩阵,计算所述目标分词的关键值;
    按照所述关键值从大到小的顺序从所述多个文本分词中选取预设数量的文本分词为特征分词;
    将所述特征分词的词向量进行拼接,得到所述文本内容的文本特征。
  4. 如权利要求3所述的图像分类方法,其中,所述利用所有所述词向量生成所述文本内容对应的文本向量矩阵,包括:
    从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,并统计所述目标分词和所述目标分词的相邻文本分词在所述目标分词的预设邻域范围内共同出现的共现次数;
    利用每一个文本分词对应的共现次数构建共现矩阵;
    将所有所述词向量拼接为向量矩阵;
    利用所述共现矩阵和所述向量矩阵进行乘积运算,得到所述文本内容对应的文本向量矩阵。
  5. 如权利要求1所述的图像分类方法,其中,所述对所述图像特征及所述文本特征进行融合,得到融合特征,包括:
    对所述图像特征进行矩阵转换处理,得到与所述文本特征相同维度的图像特征;
    利用预设的全连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征。
  6. 如权利要求5所述的图像分类方法,其中,所述利用预设的全连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征,包括:
    利用如下预设的融合函数,生成融合特征:
    F=dense(softmax(dot(Q,transpose(K)))*K
    F为所述融合特征,Q为所述转换后的图像特征,K为所述文本特征,transpose为转置 函数,dot为矩阵乘法,softmax为激活函数,dense为所述预设的全连接层网络的卷积计算算法。
  7. 如权利要求3所述的图像分类方法,其中,所述根据所述目标分词的词向量及所述文本向量矩阵,计算所述目标分词的关键值,包括:
    利用如下关键值算法计算所述目标分词的关键值:
    Figure PCTCN2022090437-appb-100001
    其中,K为所述关键值,|W|为所述文本向量矩阵,T为矩阵转置符号,||为求模符号,
    Figure PCTCN2022090437-appb-100002
    为所述目标分词的词向量。
  8. 一种图像分类装置,其中,所述装置包括:
    特征提取模块,用于获取待分类图像,提取所述待分类图像的图像特征,识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
    特征融合模块,用于对所述图像特征及所述文本特征进行融合,得到融合特征;
    分类分析模块,用于利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值,利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
  9. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下所述的图像分类方法:
    获取待分类图像,提取所述待分类图像的图像特征;
    识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
    对所述图像特征及所述文本特征进行融合,得到融合特征;
    利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
    利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
  10. 如权利要求9所述的电子设备,其中,所述提取所述待分类图像的图像特征,包括:
    对所述待分类图像进行色彩空间归一化处理,得到标准图像;
    将每张所述标准图像按照预设比例划分为多个图像块,计算每个所述图像块中每个像素的像素梯度,根据所述像素梯度统计得到每个所述图像块的梯度直方图;
    将所述梯度直方图转换为向量,并将所有梯度直方图的向量进行拼接,得到所述待分类图像的图像特征。
  11. 如权利要求9所述的电子设备,其中,所述提取所述文本内容的文本特征,包括:
    对所述文本内容进行分词,得到多个文本分词;
    生成每个所述文本分词对应的词向量,利用所有所述词向量生成所述文本内容对应的文本向量矩阵;
    从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,根据所述目标分词的词向量及所述文本向量矩阵,计算所述目标分词的关键值;
    按照所述关键值从大到小的顺序从所述多个文本分词中选取预设数量的文本分词为特征分词;
    将所述特征分词的词向量进行拼接,得到所述文本内容的文本特征。
  12. 如权利要求11所述的电子设备,其中,所述利用所有所述词向量生成所述文本内容对应的文本向量矩阵,包括:
    从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,并统计所述目标分 词和所述目标分词的相邻文本分词在所述目标分词的预设邻域范围内共同出现的共现次数;
    利用每一个文本分词对应的共现次数构建共现矩阵;
    将所有所述词向量拼接为向量矩阵;
    利用所述共现矩阵和所述向量矩阵进行乘积运算,得到所述文本内容对应的文本向量矩阵。
  13. 如权利要求9所述的电子设备,其中,所述对所述图像特征及所述文本特征进行融合,得到融合特征,包括:
    对所述图像特征进行矩阵转换处理,得到与所述文本特征相同维度的图像特征;
    利用预设的全连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征。
  14. 如权利要求13所述的电子设备,其中,所述利用预设的全连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征,包括:
    利用如下预设的融合函数,生成融合特征:
    F=dense(softmax(dot(Q,transpose(K)))*K
    F为所述融合特征,Q为所述转换后的图像特征,K为所述文本特征,transpose为转置函数,dot为矩阵乘法,softmax为激活函数,dense为所述预设的全连接层网络的卷积计算算法。
  15. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下所述的图像分类方法:
    获取待分类图像,提取所述待分类图像的图像特征;
    识别所述待分类图像中的文本内容,提取所述文本内容的文本特征;
    对所述图像特征及所述文本特征进行融合,得到融合特征;
    利用预先训练的激活函数计算所述融合特征与预设的多个分类标签之间的概率值;
    利用预先训练的集成分类模型,根据所述融合特征及所述概率值对所述待分类图像进行图像分类分析,得到所述待分类图像的分类结果。
  16. 如权利要求15所述的计算机可读存储介质,其中,所述提取所述待分类图像的图像特征,包括:
    对所述待分类图像进行色彩空间归一化处理,得到标准图像;
    将每张所述标准图像按照预设比例划分为多个图像块,计算每个所述图像块中每个像素的像素梯度,根据所述像素梯度统计得到每个所述图像块的梯度直方图;
    将所述梯度直方图转换为向量,并将所有梯度直方图的向量进行拼接,得到所述待分类图像的图像特征。
  17. 如权利要求15所述的计算机可读存储介质,其中,所述提取所述文本内容的文本特征,包括:
    对所述文本内容进行分词,得到多个文本分词;
    生成每个所述文本分词对应的词向量,利用所有所述词向量生成所述文本内容对应的文本向量矩阵;
    从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,根据所述目标分词的词向量及所述文本向量矩阵,计算所述目标分词的关键值;
    按照所述关键值从大到小的顺序从所述多个文本分词中选取预设数量的文本分词为特征分词;
    将所述特征分词的词向量进行拼接,得到所述文本内容的文本特征。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述利用所有所述词向量生成所述文本内容对应的文本向量矩阵,包括:
    从所述多个文本分词中逐个选取其中一个文本分词作为目标分词,并统计所述目标分词和所述目标分词的相邻文本分词在所述目标分词的预设邻域范围内共同出现的共现次数;
    利用每一个文本分词对应的共现次数构建共现矩阵;
    将所有所述词向量拼接为向量矩阵;
    利用所述共现矩阵和所述向量矩阵进行乘积运算,得到所述文本内容对应的文本向量矩阵。
  19. 如权利要求15所述的计算机可读存储介质,其中,所述对所述图像特征及所述文本特征进行融合,得到融合特征,包括:
    对所述图像特征进行矩阵转换处理,得到与所述文本特征相同维度的图像特征;
    利用预设的全连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征。
  20. 如权利要求19所述的计算机可读存储介质,其中,所述利用预设的全连接层网络将所述文本特征及转换后的图像特征进行关联,得到融合特征,包括:
    利用如下预设的融合函数,生成融合特征:
    F=dense(softmax(dot(Q,transpose(K)))*K
    F为所述融合特征,Q为所述转换后的图像特征,K为所述文本特征,transpose为转置函数,dot为矩阵乘法,softmax为激活函数,dense为所述预设的全连接层网络的卷积计算算法。
PCT/CN2022/090437 2022-03-25 2022-04-29 图像分类方法、装置、设备及介质 WO2023178798A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210299096.0A CN114677526A (zh) 2022-03-25 2022-03-25 图像分类方法、装置、设备及介质
CN202210299096.0 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023178798A1 true WO2023178798A1 (zh) 2023-09-28

Family

ID=82074693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090437 WO2023178798A1 (zh) 2022-03-25 2022-04-29 图像分类方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114677526A (zh)
WO (1) WO2023178798A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117943213A (zh) * 2024-03-27 2024-04-30 浙江艾领创矿业科技有限公司 微泡浮选机的实时监测预警***及方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160253466A1 (en) * 2013-10-10 2016-09-01 Board Of Regents, The University Of Texas System Systems and methods for quantitative analysis of histopathology images using multiclassifier ensemble schemes
CN107683469A (zh) * 2015-12-30 2018-02-09 中国科学院深圳先进技术研究院 一种基于深度学习的产品分类方法及装置
CN109784394A (zh) * 2019-01-07 2019-05-21 平安科技(深圳)有限公司 一种翻拍图像的识别方法、***及终端设备
CN111444960A (zh) * 2020-03-26 2020-07-24 上海交通大学 基于多模态数据输入的皮肤病图像分类***
CN113449821A (zh) * 2021-08-31 2021-09-28 浙江宇视科技有限公司 融合语义和图像特征的智能训练方法、装置、设备及介质
CN113670478A (zh) * 2021-07-09 2021-11-19 广州市倍尔康医疗器械有限公司 基于测温仪的温度数据的修正方法、***、装置及介质
CN113870478A (zh) * 2021-09-29 2021-12-31 平安银行股份有限公司 快速取号方法、装置、电子设备及存储介质
CN114049676A (zh) * 2021-11-29 2022-02-15 中国平安财产保险股份有限公司 疲劳状态检测方法、装置、设备及存储介质
CN114187476A (zh) * 2021-12-14 2022-03-15 中国平安财产保险股份有限公司 基于图像分析的车险信息核对方法、装置、设备及介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160253466A1 (en) * 2013-10-10 2016-09-01 Board Of Regents, The University Of Texas System Systems and methods for quantitative analysis of histopathology images using multiclassifier ensemble schemes
CN107683469A (zh) * 2015-12-30 2018-02-09 中国科学院深圳先进技术研究院 一种基于深度学习的产品分类方法及装置
CN109784394A (zh) * 2019-01-07 2019-05-21 平安科技(深圳)有限公司 一种翻拍图像的识别方法、***及终端设备
CN111444960A (zh) * 2020-03-26 2020-07-24 上海交通大学 基于多模态数据输入的皮肤病图像分类***
CN113670478A (zh) * 2021-07-09 2021-11-19 广州市倍尔康医疗器械有限公司 基于测温仪的温度数据的修正方法、***、装置及介质
CN113449821A (zh) * 2021-08-31 2021-09-28 浙江宇视科技有限公司 融合语义和图像特征的智能训练方法、装置、设备及介质
CN113870478A (zh) * 2021-09-29 2021-12-31 平安银行股份有限公司 快速取号方法、装置、电子设备及存储介质
CN114049676A (zh) * 2021-11-29 2022-02-15 中国平安财产保险股份有限公司 疲劳状态检测方法、装置、设备及存储介质
CN114187476A (zh) * 2021-12-14 2022-03-15 中国平安财产保险股份有限公司 基于图像分析的车险信息核对方法、装置、设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117943213A (zh) * 2024-03-27 2024-04-30 浙江艾领创矿业科技有限公司 微泡浮选机的实时监测预警***及方法
CN117943213B (zh) * 2024-03-27 2024-06-04 浙江艾领创矿业科技有限公司 微泡浮选机的实时监测预警***及方法

Also Published As

Publication number Publication date
CN114677526A (zh) 2022-06-28

Similar Documents

Publication Publication Date Title
CN113157927B (zh) 文本分类方法、装置、电子设备及可读存储介质
WO2022222300A1 (zh) 开放关系抽取方法、装置、电子设备及存储介质
CN113378970B (zh) 语句相似性检测方法、装置、电子设备及存储介质
CN114398557B (zh) 基于双画像的信息推荐方法、装置、电子设备及存储介质
CN113704474B (zh) 银行网点设备操作指引生成方法、装置、设备及存储介质
WO2023178798A1 (zh) 图像分类方法、装置、设备及介质
CN113157739B (zh) 跨模态检索方法、装置、电子设备及存储介质
CN113360654B (zh) 文本分类方法、装置、电子设备及可读存储介质
CN115114408B (zh) 多模态情感分类方法、装置、设备及存储介质
CN116680580A (zh) 基于多模态训练的信息匹配方法、装置、电子设备及介质
CN116630712A (zh) 基于模态组合的信息分类方法、装置、电子设备及介质
CN116578696A (zh) 文本摘要生成方法、装置、设备及存储介质
CN116468025A (zh) 电子病历结构化方法、装置、电子设备及存储介质
CN116340516A (zh) 实体关系的聚类提取方法、装置、设备及存储介质
CN113515591B (zh) 文本不良信息识别方法、装置、电子设备及存储介质
CN114385815A (zh) 基于业务需求的新闻筛选方法、装置、设备及存储介质
CN113822215A (zh) 设备操作指引文件生成方法、装置、电子设备及存储介质
CN113626605A (zh) 信息分类方法、装置、电子设备及可读存储介质
CN113723114A (zh) 基于多意图识别的语义分析方法、装置、设备及存储介质
CN115525730B (zh) 基于页面赋权的网页内容提取方法、装置及电子设备
CN114185617B (zh) 业务调用接口配置方法、装置、设备及存储介质
CN112214556B (zh) 标签生成方法、装置、电子设备及计算机可读存储介质
CN113592606B (zh) 基于多重决策的产品推荐方法、装置、设备及存储介质
CN113704411B (zh) 基于词向量的相似客群挖掘方法、装置、设备及存储介质
CN113157865B (zh) 跨语言词向量生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932855

Country of ref document: EP

Kind code of ref document: A1