CN117056836B

CN117056836B - Program classification model training and program category identification method and device

Info

Publication number: CN117056836B
Application number: CN202311321797.0A
Authority: CN
Inventors: 杨玺
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2023-12-12
Anticipated expiration: 2043-10-13
Also published as: CN117056836A

Abstract

The application discloses a training method of a program classification model and a program category identification method and a device, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: acquiring first multi-mode data and first preset category information; the first preset category information is obtained by identifying the first sample program category based on the first program classification model and the first multi-mode data; the first program classification model is obtained by identifying and training the first classification model category to be trained based on the second multi-mode data and the second preset category information; the second preset category information is obtained by identifying the second sample program category based on the second program classification model and the second sample program name; inputting the first multi-mode data into a second classification model category to be trained for identification, and obtaining first prediction category information; and training the second classification model to be trained according to the first prediction category information and the first preset category information to obtain a program classification model. By utilizing the technical scheme provided by the application, the program category identification accuracy can be improved.

Description

Program classification model training and program category identification method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for training a program classification model and identifying program categories.

Background

The information of the merchant applet is an important basis for judging the purpose of the merchant applet. In the prior art, only an applet name text model is built to predict the categories of the applet, and the categories of the applet are judged according to the names of the applet.

However, some applets do not accurately predict the category of an applet by the name of the applet alone. For example, part of the applet names are short, the provided effective information is insufficient, and the actual operation content of the applet cannot be well reflected, so that the prediction effect of the model is poor; or the names of the applets are not consistent with the categories of the applets, so that the prediction accuracy of the model is low, and the category information in the model training process in the prior art needs to be manually marked, so that the model training efficiency is low due to time and labor consumption. Therefore, there is a need to provide more efficient training of program classification models to improve model training efficiency and program classification accuracy.

Disclosure of Invention

The application provides a method, a device, equipment, a storage medium and a computer program product for training a program classification model, which can improve the training efficiency of the program classification model and the accuracy of program category identification, and further can improve the accuracy and the effectiveness of category identification in the application process of the subsequent program category identification.

In one aspect, the present application provides a method for training a program classification model, the method comprising:

acquiring first multi-mode data corresponding to a first sample program and first preset category information corresponding to the first sample program; the first preset category information is obtained by category identification of a first sample program based on a first program classification model and the first multi-mode data corresponding to the first sample program; the first program classification model is obtained by performing category identification training on a first to-be-trained classification model based on second multi-mode data corresponding to a second sample program and second preset category information corresponding to the second sample program; the second preset category information is obtained by identifying the category of the second sample program based on a second program classification model and a second sample program name corresponding to the second sample program;

inputting the first multi-mode data into a second classification model to be trained for category identification, and obtaining first prediction category information corresponding to the first sample program;

and training the second classification model to be trained according to the first prediction category information and the first preset category information to obtain a target program classification model.

Another aspect provides a program category identification method, the method comprising:

acquiring target multi-mode data corresponding to a target program;

inputting the target multi-mode data into a target program classification model, and identifying the category of the target program to obtain target category information corresponding to the target program, wherein the target program classification model is obtained based on the training method of any program classification model.

In another aspect, a training apparatus for a program classification model is provided, the apparatus comprising:

the information acquisition module is used for acquiring first multi-mode data corresponding to a first sample program and first preset category information corresponding to the first sample program; the first preset category information is obtained by category identification of a first sample program based on a first program classification model and the first multi-mode data corresponding to the first sample program; the first program classification model is obtained by performing category identification training on a first to-be-trained classification model based on second multi-mode data corresponding to a second sample program and second preset category information corresponding to the second sample program; the second preset category information is obtained by identifying the category of the second sample program based on a second program classification model and a second sample program name corresponding to the second sample program;

The category identification processing module is used for inputting the first multi-mode data into a second classification model to be trained to identify categories, and obtaining first prediction category information corresponding to the first sample program;

and the model training module is used for training the second classification model to be trained according to the first prediction category information and the first preset category information to obtain a target program classification model.

Another aspect provides a program category identification apparatus, the apparatus comprising:

the data acquisition module is used for acquiring target multi-mode data corresponding to the target program;

the category information determining module is used for inputting the target multi-mode data into a target program classification model, identifying the category of the target program, and obtaining target category information corresponding to the target program, wherein the target program classification model is obtained based on the training method of the program classification model.

Another aspect provides an electronic device, comprising: a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method or program category identification method of the program classification model of any one of the above.

Another aspect provides a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the training method or program category identification method of the program classification model of any one of the above.

Another aspect provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the training method of the program classification model or the program category identification method provided in the above-described various alternative implementations.

The program classification model training and program category identification method, device, equipment, storage medium and computer program product provided by the application have the following technical effects:

in a program category identification scene, target category information corresponding to a target program is obtained by inputting target multi-mode data corresponding to the target program into a target program classification model for category identification; the target program classification model is a classification model obtained by training a second classification model to be trained based on first multi-mode data corresponding to a first sample program and first preset category information corresponding to the first sample program; the first preset category information is obtained by identifying categories of the first sample program based on the first program classification model and first multi-mode data corresponding to the first sample program; the first program classification model is obtained by performing category identification training on the first classification model to be trained based on second multi-mode data corresponding to a second sample program and second preset category information corresponding to the second sample program; and the second preset category information is obtained by identifying the category of the second sample program based on the second program classification model and the second sample program name corresponding to the second sample program, so that the effectiveness and the efficiency of model training by combining multi-mode data in the training process of the classification model to be trained can be improved, the accuracy of program category identification can be better improved, the maintenance and development costs of the model can be reduced, and the accuracy and the effectiveness of category identification in the program category identification process can be further improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of a training method of a program classification model according to an embodiment of the present application;

FIG. 2 is a flowchart of a training method of a program classification model according to an embodiment of the present application;

fig. 3 is a schematic flow chart of acquiring first preset category information corresponding to a first sample procedure according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of a first program classification model obtained by performing category identification training on a first classification model to be trained based on second multi-modal data and second preset category information according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a method for inputting first multi-mode data into a second classification model to be trained to identify categories and obtaining first prediction category information corresponding to a first sample program according to the embodiment of the present application;

FIG. 6 is a schematic diagram of a process for category identification by a second classification model to be trained according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a program category identification method according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training device for a program classification model according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a program category identifying device according to an embodiment of the present application;

FIG. 10 is a block diagram of an electronic device for training of program classification models or program category identification provided by an embodiment of the application;

FIG. 11 is a block diagram of another electronic device for training of program classification models or program category identification provided by an embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "first," "second," and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by people in daily life, and is closely researched with linguistics; meanwhile, the method relates to important technology of model training in the field of computer science and mathematics and artificial intelligence, and a pre-training model is developed from a large language model (Large Language Model, LLM) in the NLP field. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing and other technologies, in particular to processing such as training of a program classification model based on natural language processing and program category identification, and the method is specifically described by the following embodiments:

referring to fig. 1, fig. 1 is a schematic diagram of an application environment of a training method of a program classification model according to an embodiment of the present application, where the application environment may at least include a server 100 and a terminal 200.

In an alternative embodiment, the server 100 may be used to perform the training process of the program classification model, where the server 100 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides a cloud computing service.

In an alternative embodiment, terminal 200 may be used to provide services such as program category identification to a user based on a program classification model. Specifically, the terminal 200 may include, but is not limited to, smart phones, desktop computers, tablet computers, notebook computers, smart speakers, digital assistants, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, smart wearable devices, vehicle terminals, smart televisions, and other types of electronic devices; or software running on the electronic device, such as an application, applet, etc. Operating systems running on the electronic device in embodiments of the present application may include, but are not limited to, android systems, IOS systems, linux, windows, and the like.

In addition, it should be noted that, fig. 1 is merely an application environment of a training method of a program classification model, and the embodiment of the present disclosure is not limited to the above.

In the embodiment of the present disclosure, the server 100 and the terminal 200 may be directly or indirectly connected through a wired or wireless communication method, which is not limited herein.

In the related art, in the category identification scenario of the applet, the category identification of the applet may be based on the name of the applet; however, in the related art, the class of the applet is predicted by the applet name model, and when the name of the applet is short, the provided effective information is insufficient, or the name of the applet is not consistent with the actual class of the applet, the applet name model has poor prediction effect and low prediction accuracy, and therefore the applet name model cannot be effectively combined to help the applet name model to recognize the class of the applet.

In the following, a training method of a program classification model according to the present application is described, and fig. 2 is a schematic flow chart of a training method of a program classification model according to an embodiment of the present application, and the present specification provides method operation steps as an example or a flowchart, but may include more or fewer operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). Specifically, as shown in fig. 2, the method may include:

S201: and acquiring first multi-mode data corresponding to the first sample program and first preset category information corresponding to the first sample program.

In a particular embodiment, the first sample program may include a plurality of applets in an applet platform. Alternatively, the plurality of applets may be applets for a plurality of different merchants; optionally, the first multimodal data may be multimodal data corresponding to the first sample program, and specifically, the multimodal data may include text, image, and other data, and correspondingly, the first multimodal data includes a first sample program text corresponding to the first sample program and a first sample program image corresponding to the first sample program. Specifically, the first sample program text may be text describing the first sample program; alternatively, the first sample program text may include a first sample program name of the first sample program, and a first sample identification text corresponding to the first sample program image; specifically, the first sample program name may be a program name corresponding to the first sample program; specifically, the first sample recognition text may be text recognized from the first sample program image. Specifically, the first sample program image may be an image describing the first sample program; alternatively, the first sample program image may be a program screenshot corresponding to the first sample program.

In another specific embodiment, the first sample program text may further include a first sample program digest of the first sample program, first sample program evaluation information, and the like; in particular, the first sample program digest may be text describing the primary function from which the first sample program originated, and the first sample program rating information may be text describing the user's experience with the first sample program.

In a specific embodiment, in a case where the content of the text identified from the first sample program image is relatively complicated, the text identified from the first sample program image may be first subjected to processing such as cleaning, stop word deletion, word segmentation, and the like, and then used as the first sample identification text. Alternatively, in the case where the text identified from the first sample program image is long, TF-IDF (Term Frequency-inverse text Frequency index) may be used, a common weighting technique for information retrieval and text mining is used to perform keyword extraction processing on the text identified from the first sample program image, and the extracted keyword is used as the first sample identification text.

In a specific embodiment, the first preset category information may be program category information characterizing each applet in the first sample program; specifically, program categories may be divided in accordance with actual application requirements, such as dining categories, entertainment categories, shopping categories, and the like. Alternatively, assuming that the total number of program classes in the applet platform is N (N is a positive integer), the first preset class information may be a vector 1*N, where each element in the vector corresponds to one program class, and specifically, each element may represent a probability that an applet belongs to the corresponding program class (if a applet belongs to a corresponding program class, the corresponding probability is 1, and otherwise, a applet does not belong to a corresponding program class, the corresponding probability is 0). Specifically, the multiple program categories corresponding to the multiple applets in the first sample program may be all program categories in the applet platform. Specifically, each applet corresponds to a program category, and each program category corresponds to at least one applet.

In a specific embodiment, the first preset category information is obtained by identifying the category of the first sample program based on the first program classification model and first multi-mode data corresponding to the first sample program; specifically, the first multi-mode data may be input into the first program classification model to perform category identification, so as to obtain first preset category information. Specifically, the model structure of the first program classification model may be set in combination with practical applications, and specifically, the first program classification model may be a deep learning model trained in advance and used for performing category recognition based on multi-mode data of the applet. Optionally, the first program classification model may be obtained by performing category identification training on the first to-be-trained classification model based on second multi-modal data corresponding to the second sample program and second preset category information corresponding to the second sample program; the second preset category information is obtained by identifying the category of the second sample program based on the second program classification model and the second sample program name corresponding to the second sample program.

In a specific embodiment, the second sample program may include a plurality of applets in the applet platform, and optionally, the applets included in the first sample program may be the same as or different from the applets included in the second sample program. The second multi-modal data may be multi-modal data corresponding to the second sample procedure; in particular, for a specific refinement of the second multi-modal data, reference may be made to the above-mentioned related refinement of the first multi-modal data, which is not described herein.

In a specific embodiment, the second preset category information may be program category information characterizing each applet in the second sample program; specifically, for a specific refinement of the second preset category information, reference may be made to the related refinement of the first preset category information, which is not described herein.

In a specific embodiment, the model structure of the second program classification model may be set in conjunction with practical applications, and in particular, the second program classification model may be a pre-trained deep learning model that performs category recognition based on the program name of the applet. Specifically, the second sample program name may be input into a second program classification model to identify the category, so as to obtain the second preset category information. Specifically, the second sample program name may be a program name corresponding to the second sample program.

In an alternative embodiment, fig. 3 is a flowchart illustrating a process of acquiring first preset category information corresponding to a first sample procedure according to an embodiment of the present application, where, as shown in fig. 3, the first preset category information corresponding to the first sample procedure is acquired in the following manner:

s2011: acquiring a second sample program name and second multi-modal data;

S2013: inputting a second sample program name into a second program classification model, and identifying the category of the second sample program to obtain second preset category information corresponding to the second sample program;

s2015: based on the second multi-mode data and the second preset category information, category identification training is carried out on the first to-be-trained classification model, and a first program classification model is obtained;

s2017: inputting the first multi-mode data into a first program classification model, and identifying the category of the first sample program to obtain first preset category information corresponding to the first sample program.

In a specific embodiment, the second sample program name may be a program name corresponding to the second sample program.

In an optional embodiment, the second multimodal data includes a second sample program name, a second sample program image, and a second sample identification text corresponding to the second sample program image; correspondingly, fig. 4 is a schematic flow chart of a first program classification model obtained by performing category identification training on a first to-be-trained classification model based on second multi-modal data and second preset category information according to an embodiment of the present application; as shown in fig. 4, performing category identification training on the first to-be-trained classification model based on the second multimodal data and the second preset category information to obtain a first program classification model may include:

S401: acquiring first preset weight information corresponding to a second sample program name, second preset weight information corresponding to a second sample identification text and third preset weight information corresponding to a second sample program image;

s403: performing weighted fusion processing on the second sample program name, the second sample identification text and the second sample program image based on the first preset weight information, the second preset weight information and the third preset weight information to obtain a second program fusion feature;

s405: inputting the fusion characteristics of the second program into the first classification model to be trained for category identification, and obtaining second prediction category information corresponding to the second sample program;

s407: and training the first classification model to be trained according to the second prediction category information and the second preset category information to obtain a first program classification model.

In a specific embodiment, the first preset weight information may represent a feature importance degree of the second sample program name in a process of performing category identification training on the first classification model to be trained, and optionally, the first preset weight information may be a proportion of the second sample program name in the second program fusion feature; the second preset weight information may represent a feature importance degree of the second sample recognition text in a process of performing category recognition training on the first classification model to be trained, and optionally, the second preset weight information may be a proportion of the second sample recognition text in a second program fusion feature; the third preset weight information may represent the feature importance degree of the second sample program image in the process of performing category identification training on the first classification model to be trained, and optionally, the third preset weight information may be the proportion of the second sample program image in the second program fusion feature. Specifically, the first preset weight information, the second preset weight information and the third preset weight information may be a, (1-a)/2, respectively. Specifically, a can be set in combination with practical application; optionally, a is a prediction probability of the second program classification model for performing category identification on the second sample program according to the second sample program name.

In a specific embodiment, performing weighted fusion processing on the second sample program name, the second sample identification text and the second sample program image based on the first preset weight information, the second preset weight information and the third preset weight information, and obtaining the second program fusion feature may include: extracting features of the second sample program name to obtain second program name features; extracting features of the second sample recognition text to obtain second program recognition text features; extracting features of the second sample program image to obtain second program image features; and carrying out weighted fusion processing on the second program name feature, the second program identification text feature and the second program image feature based on the first preset weight information, the second preset weight information and the third preset weight information to obtain a second program fusion feature.

In a specific embodiment, the second program name feature is name feature information corresponding to the second sample program; optionally, the second program name feature may be obtained by combining a preset text feature extraction model to perform feature extraction on the second sample program name. The second program identification text feature is identification text feature information corresponding to the second sample program; optionally, the second program identifying text feature may be combined with a preset text feature extraction model, and feature extraction is performed on the second sample identifying text. The second program image feature is image feature information corresponding to a second sample program; optionally, the second program image features may be combined with a preset image feature extraction model, and feature extraction is performed on the second sample program image. The second program fusion feature is fusion feature information corresponding to the second sample program after weighted fusion processing is carried out on the second program name feature, the second program identification text feature and the second program image feature; optionally, the second program fusion feature may be obtained by combining a preset feature fusion model to perform weighted fusion processing on the second program name feature, the second program identification text feature and the second program image feature.

In a specific embodiment, the second prediction category information may be program category information characterizing each applet in the second sample program identified by the first classification model to be trained.

In a specific embodiment, training the first to-be-trained classification model according to the second prediction category information and the second preset category information corresponding to the second sample program, to obtain the first program classification model may include: determining second category identification loss of the first to-be-trained classification model according to second prediction category information and second preset category information corresponding to the second sample program; and training the first program classification model to be trained based on the second category recognition loss to obtain the first program classification model.

In a specific embodiment, the second category identification loss may be calculated in combination with a preset loss function; alternatively, the preset loss function may be set in connection with the actual application requirement, such as an exponential loss function, a cross entropy loss function, etc. The second category identification loss may be indicative of accuracy of program category identification of the current first classification model to be trained.

In a specific embodiment, training the first to-be-trained classification model based on the second category identification loss may include: updating model parameters of the first to-be-trained classification model based on the second category recognition loss, and repeating the second training iteration step of inputting the multi-mode data corresponding to the second sample program into the first to-be-trained classification model to perform program category recognition based on the updated first to-be-trained classification model to obtain second prediction category information until a second preset convergence condition is met. The meeting of the second preset convergence condition may be that the second category identification loss information is less than or equal to a second preset loss threshold, or the number of times of the second training iteration step reaches a second preset number of times, or the like, and specifically, the second preset loss threshold and the second preset number of times may be set in combination with the model precision and the training speed requirement in practical application.

S203: and inputting the first multi-mode data into a second classification model to be trained to identify categories, and obtaining first prediction category information corresponding to the first sample program.

In a specific embodiment, the first prediction category information may be program category information characterizing each applet in the first sample program identified by the second classification model to be trained.

In an alternative embodiment, the second classification model to be trained may include: a text feature extraction model, an image feature extraction model, a feature fusion model and a classification model; fig. 5 is a schematic flow diagram of a process of inputting first multi-modal data into a second classification model to be trained to identify categories, so as to obtain first prediction category information corresponding to a first sample program; as shown in fig. 5, the inputting the first multi-mode data into the second classification model to be trained to perform category identification, and obtaining the first prediction category information corresponding to the first sample program includes:

s2031: inputting the first sample program text into a text feature extraction model to extract the program text features, so as to obtain the program text features;

S2033: inputting the first sample program image into an image feature extraction model to extract program image features, so as to obtain program image features;

s2035: inputting the program text features and the program image features into a feature fusion model to perform feature fusion processing to obtain first program fusion features;

s2037: and inputting the fusion characteristics of the first program into a classification model for classification processing to obtain first prediction category information.

In a specific embodiment, the text feature of the program is text feature information corresponding to the first sample program; alternatively, the program text features may be combined with a text feature extraction model to perform feature extraction on the first sample program text. Program image characteristics are image characteristic information corresponding to a first sample program; alternatively, the program image features may be combined with an image feature extraction model to perform feature extraction on the first sample program image. The first program fusion feature is fusion feature information corresponding to the first sample program after feature fusion processing is carried out on the program text feature and the program image feature; optionally, the first program fusion feature may be obtained by performing feature fusion processing on the program text feature and the program image feature in combination with the feature fusion model.

In a specific embodiment, a text feature extraction model is used to extract feature information of the program text. Optionally, the model structure of the text feature extraction model may be set in combination with practical applications, and specifically, the text feature extraction model may include a word feature extraction layer, a sentence feature extraction layer, a semantic feature extraction layer, and a feature fusion layer. Specifically, inputting the first sample program text into the text feature extraction model to extract the program text features may include: inputting the first sample program text into a word feature extraction layer for word segmentation processing to obtain word feature information corresponding to the first sample program text; inputting the first sample program text into a sentence characteristic extraction layer for sentence classification processing to obtain sentence characteristic information corresponding to the first sample program text; inputting the first sample program text into a semantic feature extraction layer for semantic analysis processing to obtain semantic feature information corresponding to the first sample program text; and inputting the word characteristic information, the sentence characteristic information and the semantic characteristic information into a characteristic fusion layer to perform characteristic fusion processing, so as to obtain the text characteristics of the program. For example, the text feature extraction model may be a Transformer-based bi-directional encoder, which may include token embedding for converting each word in the text into feature information of a fixed dimension, segment embedding for performing a classification task on the input multiple sentences (sentence embedding layer), and position embedding for position encoding each word input to be able to express the semantics of the text (position embedding layer).

In a specific embodiment, an image feature extraction model is used to extract feature information of the program image. Optionally, the model structure of the image feature extraction model may be set in combination with practical applications, and specifically, the image feature extraction model may include an image feature extraction layer, an image feature dimension reduction layer, and an image feature output layer. Specifically, inputting the first sample program image into the image feature extraction model to extract the program image features may include: inputting the first sample program image into an image feature extraction layer for feature extraction processing to obtain program image feature information corresponding to the first sample program image; inputting the program image characteristic information into an image characteristic dimension reduction layer for characteristic dimension reduction processing to obtain dimension reduced program image characteristic information; and inputting the program image characteristic information subjected to dimension reduction into an image characteristic output layer to perform characteristic fusion processing, so as to obtain the program image characteristic. For example, the image feature extraction model may be a cnn-based encoder (Convolutional Neural Network-based encoder based on a convolutional neural network), including a convolutional layer for extracting feature information, a pooling layer for compressing image size, reducing dimension of feature information, and accelerating operation speed of the neural network, and a fully-connected layer for performing a classification fusion process on the feature information.

In a specific embodiment, inputting the program text feature and the program image feature into the feature fusion model to perform feature fusion processing, and obtaining the first program fusion feature may include: and the program text features correspond to the first weight information, the program image features correspond to the second weight information, and the program text features and the program image features are subjected to weighted fusion processing based on the first weight information and the second weight information to obtain first program fusion features. Alternatively, the feature fusion model may be set in conjunction with the actual application requirements, for example, the feature fusion model may be an MLP (Multilayer Perceptron, multi-layer perceptron). Optionally, in the training process of the classification model to be trained, the first weight information and the second weight information are model parameters in the feature fusion model, and are continuously adjusted in the training process until the target program classification model is trained.

In a specific embodiment, the first program fusion feature is input into a classification model for classification processing, so as to obtain first prediction category information. Alternatively, the classification model may be set in connection with actual application requirements, for example, based on a softmax function or the like.

In an alternative embodiment, the first sample program text may include a first sample program name and a first sample identification text corresponding to the first sample program image; program text features include program name features and program identification text features;

correspondingly, the inputting the first sample program text into the text feature extraction model to extract the program text features may include:

inputting the first sample program name into a text feature extraction model to extract the program name feature, so as to obtain the program name feature;

and inputting the first sample identification text into a text feature extraction model to extract the program identification text features, so as to obtain the program identification text features.

In a specific embodiment, the program name feature is name feature information corresponding to the first sample program; alternatively, the program name feature may be obtained by feature extraction of the first sample program name in combination with a text feature extraction model. Program identification text feature is identification text feature information corresponding to the first sample program; alternatively, the program identification text feature may be obtained by feature extraction of the first sample identification text in combination with a text feature extraction model.

In a specific embodiment, inputting the program text feature and the program image feature into the feature fusion model to perform feature fusion processing, and obtaining the first program fusion feature may include: program text features correspond to the first weight information, program image features correspond to the second weight information, program identification text features correspond to the third weight information, and weighting fusion processing is conducted on the program text features, the program image features and the program identification text features based on the first weight information, the second weight information and the third weight information to obtain first program fusion features. Optionally, in the training process of the classification model to be trained, the first weight information, the second weight information and the third weight information are model parameters in the feature fusion model, and are continuously adjusted in the training process until the target program classification model is trained.

In a specific embodiment, as shown in fig. 6, fig. 6 is a schematic process diagram of category identification performed by a second classification model to be trained according to an embodiment of the present application. Specifically, the first sample program name can be input into a text feature extraction model to extract the program name feature, so as to obtain the program name feature; then, inputting the first sample identification text into a text feature extraction model to extract the text features of the program identification, so as to obtain the text features of the program identification; then, inputting the first sample program image into an image feature extraction model to extract program image features, and obtaining the program image features; then, inputting the program name feature, the program identification text feature and the program image feature into a feature fusion model to perform feature fusion processing to obtain a first program fusion feature; and then, inputting the fusion characteristics of the first program into a classification model for classification processing to obtain the information of the first prediction category.

In the above embodiment, the program name feature, the program identification text feature and the program image feature are input into the feature fusion model to perform feature fusion processing to obtain the first program fusion feature, and then the first program fusion feature is input into the classification model to perform classification processing to obtain the first prediction category information, so that the feature information required by the second classification model to be trained can be enriched, the accuracy of model prediction can be improved, and the accuracy and the effectiveness of program category identification can be improved; and feature information of the program image and the text is fused, so that the model can learn more features, and the model also learns the corresponding relation between the image and the text, thereby improving the accuracy of program category identification.

S205: and training the second classification model to be trained according to the first prediction category information and the first preset category information to obtain a target program classification model.

In a specific embodiment, training the second classification model to be trained according to the first prediction category information and the first preset category information, to obtain the target program classification model may include: determining a first category identification loss of the second program classification model to be trained according to the first prediction category information and the first preset category information; and training the second classification model to be trained based on the first category recognition loss to obtain the target program classification model.

In a specific embodiment, the first category identification loss may be calculated in combination with a preset loss function; alternatively, the preset loss function may be set in connection with the actual application requirement, such as an exponential loss function, a cross entropy loss function, etc. The first category identification loss may characterize accuracy of program category identification of the current second classification model to be trained.

In a specific embodiment, training the second classification model to be trained based on the first category recognition loss may include: updating model parameters of the second classification model to be trained based on the first category recognition loss, and repeating the first training iteration step of inputting the multi-mode data corresponding to the first sample program into the second classification model to be trained based on the updated second classification model to be trained to recognize the categories, so as to obtain first prediction category information, and updating the model parameters of the second classification model to be trained based on the first category recognition loss until a first preset convergence condition is met. The meeting of the first preset convergence condition may be that the first category identification loss information is less than or equal to a first preset loss threshold, or the number of times of the first training iteration step reaches a first preset number of times, or the like, and specifically, the first preset loss threshold and the first preset number of times may be set in combination with the model precision and the training speed requirement in practical application.

As can be seen from the technical solutions provided in the embodiments of the present disclosure, in the present disclosure, first multi-mode data corresponding to a first sample program is input into a second classification model to be trained to perform category identification, so as to obtain first prediction category information corresponding to the first sample program; training a second classification model to be trained according to the first prediction category information and the first preset category information to obtain a target program classification model, and carrying out model training by combining multi-mode data of a first sample program can improve the effectiveness and efficiency of the program classification model, better improve the accuracy of program category identification and reduce the maintenance and development cost of the model; the first preset category information is obtained by identifying the category of the first sample program based on the first program classification model and the first multi-mode data corresponding to the first sample program, so that the preset category information with higher accuracy can be provided, manual labeling samples are saved, and the training efficiency of the program classification model is improved; the first program classification model is obtained by performing category identification training on a first to-be-trained classification model based on second multi-mode data corresponding to a second sample program and second preset category information corresponding to the second sample program; the second preset category information is obtained by identifying the category of the second sample program based on the second program classification model and the second sample program name corresponding to the second sample program, so that the preset category information with higher accuracy can be provided, the training time of the program classification model is saved, the training efficiency of the program classification model is improved, and the accuracy of identifying the program category is improved.

The following describes a program category identification method of a program classification model trained based on the training method of the program classification model of the present application, and fig. 7 is a schematic flow chart of a program category identification method provided by an embodiment of the present application, as shown in fig. 7, where the method may include:

s701: acquiring target multi-mode data corresponding to a target program;

in a specific embodiment, the object may be any applet that requires identification of a class. Optionally, the target multi-modal data may be multi-modal data corresponding to the target program, and specifically, the multi-modal data may include text, image, and other data, where the target multi-modal data includes a target program text corresponding to the target program and a target program image corresponding to the target program. Specifically, the target program text may be text describing the target program; optionally, the target program text may include a target program name of the target program, and a target identification text corresponding to the target program image; specifically, the target program name may be a program name corresponding to the target program; specifically, the target recognition text may be text recognized from the target program image. Specifically, the target program image may be an image describing the target program; alternatively, the target program image may be a program screenshot corresponding to the target program.

In another specific embodiment, the object text may further include an object abstract and object evaluation information of the object; specifically, the object abstract may be text describing a main function from which the object is derived, and the object evaluation information may be text describing an experience of the user using the object.

S703: and inputting the target multi-mode data into a target program classification model, and identifying the categories of the target program to obtain target category information corresponding to the target program.

In a specific embodiment, when the content of the text identified from the target program image is relatively complicated, the text identified from the target program image may be first subjected to processing such as cleaning, stop word deletion, word segmentation, and the like, and then used as the target identification text. Alternatively, in the case where the text recognized from the target program image is long, TF-IDF (Term Frequency-inverse text Frequency index) may be used, a common weighting technique for information retrieval and text mining may be used to perform keyword extraction processing on the text recognized from the target program image, with the extracted keywords as target recognition text.

In a specific embodiment, the target multi-mode data includes a target program name, a target recognition text and a target program image, and specifically, inputting the target multi-mode data into a target program classification model, and performing category recognition on the target program, the obtaining target category information corresponding to the target program may include: and inputting the name, the target identification text and the target program image of the target program into a target program classification model, and carrying out category identification on the target program to obtain target category information corresponding to the target program.

As can be seen from the technical solutions provided in the embodiments of the present disclosure, in a program category identification scenario in the present disclosure, target category information corresponding to a target program is obtained by inputting target multi-mode data corresponding to the target program into a target program classification model for category identification; the target program classification model is a classification model obtained by training a second classification model to be trained based on first multi-mode data corresponding to a first sample program and first preset category information corresponding to the first sample program; the first preset category information is obtained by identifying categories of the first sample program based on the first program classification model and first multi-mode data corresponding to the first sample program; the first program classification model is obtained by performing category identification training on the first classification model to be trained based on second multi-mode data corresponding to a second sample program and second preset category information corresponding to the second sample program; and the second preset category information is obtained by identifying the category of the second sample program based on the second program classification model and the second sample program name corresponding to the second sample program, so that the effectiveness and the efficiency of model training by combining multi-mode data in the training process of the classification model to be trained can be improved, the accuracy of program category identification can be better improved, the maintenance and development costs of the model can be reduced, and the accuracy and the effectiveness of category identification in the program category identification process can be further improved.

The embodiment of the application also provides a training device of the program classification model, and correspondingly, fig. 8 is a schematic structural diagram of the training device of the program classification model provided by the embodiment of the application; as shown in fig. 8, the above-mentioned apparatus includes:

an information obtaining module 810, configured to obtain first multimodal data corresponding to a first sample program and first preset category information corresponding to the first sample program; the first preset category information is obtained by category identification of a first sample program based on a first program classification model and the first multi-mode data corresponding to the first sample program; the first program classification model is obtained by performing category identification training on a first to-be-trained classification model based on second multi-mode data corresponding to a second sample program and second preset category information corresponding to the second sample program; the second preset category information is obtained by identifying the category of the second sample program based on a second program classification model and a second sample program name corresponding to the second sample program;

the category identification processing module 820 is configured to input the first multimodal data into a second classification model to be trained to perform category identification, so as to obtain first prediction category information corresponding to the first sample program;

The model training module 830 is configured to train the second classification model to be trained according to the first prediction category information and the first preset category information, so as to obtain a target program classification model.

In an alternative embodiment, the second classification model to be trained includes: a text feature extraction model, an image feature extraction model, a feature fusion model and a classification model; the first multimodal data includes a first sample program text and a first sample program image;

the category identification processing module 820 includes:

the text feature extraction module is used for inputting the first sample program text into the text feature extraction model to extract program text features and obtain program text features;

the image feature extraction module is used for inputting the first sample program image into the image feature extraction model to extract program image features so as to obtain program image features;

the feature fusion processing module is used for inputting the program text features and the program image features into the feature fusion model to perform feature fusion processing to obtain first program fusion features;

and the category information determining module is used for inputting the first program fusion characteristic into the classification model for classification processing to obtain the first prediction category information.

In an alternative embodiment, the first sample program text includes a first sample program name and a first sample identification text corresponding to the first sample program image; the program text features comprise program name features and program identification text features;

the text feature extraction module comprises:

the name feature extraction unit is used for inputting the first sample program name into the text feature extraction model to extract program name features, so as to obtain the program name features;

and the identification text feature extraction unit is used for inputting the first sample identification text into the text feature extraction model to perform program identification text feature extraction so as to obtain the program identification text feature.

In an alternative embodiment, the information acquisition module 810 includes:

a data acquisition unit, configured to acquire the second sample program name and the second multimodal data;

the first category identification processing unit is used for inputting the name of the second sample program into the second program classification model, and identifying the category of the second sample program to obtain the second preset category information corresponding to the second sample program;

The model training unit is used for carrying out category identification training on the first to-be-trained classification model based on the second multi-mode data and the second preset category information to obtain the first program classification model;

the second category identification processing unit is used for inputting the first multi-mode data into the first program classification model, and carrying out category identification on the first sample program to obtain first preset category information corresponding to the first sample program.

In an alternative embodiment, the second multimodal data includes the second sample program name, a second sample program image, and second sample identification text corresponding to the second sample program image;

the model training unit includes:

the information acquisition subunit is used for acquiring first preset weight information corresponding to the second sample program name, second preset weight information corresponding to the second sample identification text and third preset weight information corresponding to the second sample program image;

the feature fusion processing subunit is used for carrying out weighted fusion processing on the second sample program name, the second sample identification text and the second sample program image based on the first preset weight information, the second preset weight information and the third preset weight information to obtain a second program fusion feature;

The category identification processing subunit is used for inputting the fusion characteristics of the second program into the first to-be-trained classification model to identify categories and obtaining second prediction category information corresponding to the second sample program;

and the model training subunit is used for training the first classification model to be trained according to the second prediction category information and the second preset category information to obtain the first program classification model.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The embodiment of the application also provides a program category identification device, and correspondingly, fig. 9 is a schematic structural diagram of the program category identification device provided by the embodiment of the application; as shown in fig. 9, the above-mentioned apparatus includes:

a data acquisition module 910, configured to acquire target multi-modal data corresponding to a target program;

the category information determining module 920 is configured to input the target multi-mode data into a target program classification model, identify the category of the target program, and obtain target category information corresponding to the target program, where the target program classification model is obtained based on the training method of the program classification model provided by the embodiment of the present application.

Fig. 10 is a block diagram of an electronic device, which may be a terminal, for training a program classification model or identifying a program category, and an internal structure diagram thereof may be as shown in fig. 10, according to an embodiment of the present application. The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a training method for a program classification model or a program category identification method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

FIG. 11 is a block diagram of another electronic device, which may be a server, for training a program classification model or identifying program categories, according to an embodiment of the present application, and an internal structure diagram thereof may be as shown in FIG. 11. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a training method for a program classification model or a program category identification method.

It will be appreciated by those skilled in the art that the structures shown in fig. 10 or 11 are merely block diagrams of partial structures related to the present disclosure and do not constitute limitations of the electronic device to which the present disclosure is applied, and that a particular electronic device may include more or fewer components than shown in the drawings, or may combine certain components, or have different arrangements of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement a training method of a program classification model or a program category identification method as in embodiments of the present disclosure.

In an exemplary embodiment, a computer readable storage medium is also provided, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of the program classification model or the program category identification method in the embodiments of the present disclosure.

In an exemplary embodiment, a computer program product or a computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the training method of the program classification model or the program category identification method provided in the above-described various alternative implementations.

It will be appreciated that in the specific embodiments of the present application, where user-related data is involved, user approval or consent is required when the above embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data is required to comply with relevant laws and regulations and standards of the relevant country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of training a program classification model, the method comprising:

2. The method of training a procedural classification model of claim 1 wherein the second classification model to be trained comprises: a text feature extraction model, an image feature extraction model, a feature fusion model and a classification model; the first multimodal data includes a first sample program text and a first sample program image;

inputting the first multi-mode data into a second classification model to be trained for category identification, and obtaining first prediction category information corresponding to the first sample program comprises the following steps:

inputting the first sample program text into the text feature extraction model to extract program text features, so as to obtain program text features;

inputting the first sample program image into the image feature extraction model to extract program image features, so as to obtain program image features;

Inputting the program text features and the program image features into the feature fusion model to perform feature fusion processing to obtain first program fusion features;

and inputting the first program fusion characteristic into the classification model to perform classification processing to obtain the first prediction category information.

3. The method of training a program classification model according to claim 2, wherein the first sample program text includes a first sample program name and a first sample recognition text corresponding to the first sample program image; the program text features comprise program name features and program identification text features;

inputting the first sample program text into the text feature extraction model to extract program text features, wherein the obtaining program text features comprises:

inputting the first sample program name into the text feature extraction model to extract program name features, so as to obtain the program name features;

and inputting the first sample identification text into the text feature extraction model to perform program identification text feature extraction, so as to obtain the program identification text feature.

4. The method for training a program classification model according to claim 1, wherein the first preset category information corresponding to the first sample program is obtained by:

Acquiring the second sample program name and the second multi-modal data;

inputting the name of the second sample program into the second program classification model, and identifying the category of the second sample program to obtain the second preset category information corresponding to the second sample program;

based on the second multi-mode data and the second preset category information, category identification training is carried out on the first to-be-trained classification model, and the first program classification model is obtained;

and inputting the first multi-mode data into the first program classification model, and identifying the category of the first sample program to obtain the first preset category information.

5. The method of claim 4, wherein the second multimodal data includes the second sample program name, a second sample program image, and a second sample recognition text corresponding to the second sample program image;

the performing category identification training on the first to-be-trained classification model based on the second multi-modal data and the second preset category information, and obtaining the first program classification model includes:

acquiring first preset weight information corresponding to the second sample program name, second preset weight information corresponding to the second sample identification text and third preset weight information corresponding to the second sample program image;

Performing weighted fusion processing on the second sample program name, the second sample identification text and the second sample program image based on the first preset weight information, the second preset weight information and the third preset weight information to obtain a second program fusion feature;

inputting the fusion characteristics of the second program into the first classification model to be trained for category identification, and obtaining second prediction category information corresponding to the second sample program;

and training the first classification model to be trained according to the second prediction category information and the second preset category information to obtain the first program classification model.

6. A method of program category identification, the method comprising:

acquiring target multi-mode data corresponding to a target program;

inputting the target multi-mode data into a target program classification model, and identifying the category of the target program to obtain target category information corresponding to the target program, wherein the target program classification model is obtained based on the training method of the program classification model according to any one of claims 1 to 5.

7. A training device for a program classification model, the device comprising:

8. A program category identifying device, the device comprising:

the category information determining module is configured to input the target multi-modal data into a target program classification model, identify the category of the target program, and obtain target category information corresponding to the target program, where the target program classification model is obtained based on the training method of the program classification model according to any one of claims 1 to 5.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the program classification model of any one of claims 1 to 5 or the program category identification method of claim 6.

10. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the training method of the program classification model of any one of claims 1 to 5 or the program category identification method of claim 6.