CN116910377A

CN116910377A - Grid event classified search recommendation method and system

Info

Publication number: CN116910377A
Application number: CN202311185198.0A
Authority: CN
Inventors: 林韶军; 黄炳裕; 戴文艳; 何亦龙; 倪坤; 黄河; 叶威鑫; 刘骏
Original assignee: Evecom Information Technology Development Co ltd
Current assignee: Evecom Information Technology Development Co ltd
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2023-10-20
Anticipated expiration: 2043-09-14
Also published as: CN116910377B

Abstract

The application relates to a grid event classified search recommendation method, which comprises the following steps: dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories; converting the secondary category information in the text classification task into a vector by using a sentence vector model, and storing the vector in a vector retrieval library; for a given input text, converting the text into vector representation through a sentence vector model, and recalling a plurality of categories with highest cosine similarity from a vector retrieval library; after the characteristics of the plurality of category texts and the input text are constructed, inputting a sorting model to obtain sorting scores of the categories to obtain a secondary category; if the secondary category exists in the tertiary categories, obtaining tertiary categories by adopting a keyword matching technology based on the obtained secondary categories; and reversely determining the first class based on the obtained second class and the third class. According to the method and the device, the proper event category is automatically identified and recommended to the user by applying the search recommendation algorithm, so that the accuracy and the efficiency of event category selection are improved.

Description

Grid event classified search recommendation method and system

Technical Field

The application relates to the field of grid event management, in particular to a grid event classified searching and recommending method and system.

Background

Urban grid events are very heterogeneous, typically in multiple levels, potentially covering hundreds of different categories. When creating grid events, users face a large number of category choices, making it difficult to efficiently and accurately select the correct event category.

Currently existing grid event classification methods typically employ a method of fine-tuning downstream tasks using a pre-trained language model. One common approach is to use a pre-trained BERT model as a basis, on which a fully connected layer for classification is added, building a classification model. In the training process, the effect of event classification is achieved by fine adjustment of model parameters.

The main idea of this method is to use the semantic representation capability of the pre-trained language model to apply it to specific grid event classification tasks. By fine tuning on the basis of a pre-trained model, the model can learn the features and semantic information related to event classification. And then, carrying out specific classification operation by adding a full connection layer, and finally obtaining a prediction result of the event category.

However, the method based on the fine tuning of the pre-training model cannot be trained on the category without samples, and in the scene of needing to newly add or reduce the category, the method based on the fine tuning of the pre-training model needs to retrain the whole model, so that the expansibility is poor; and the text content information of the grid event category is rich, and in the method of fine tuning the pre-training model, the category is usually represented by using a one-hot coding form. The encoding mode cannot fully utilize semantic information of the category content of the grid event, so that the model is difficult to accurately capture association and semantic features between the event and the category, and the accuracy of classification is affected.

Disclosure of Invention

In order to solve the above problems, an object of the present application is to provide a grid event classification search recommendation method, which automatically identifies and recommends a suitable event category to a user by applying a search recommendation algorithm, thereby improving the accuracy and efficiency of event category selection.

In order to achieve the above purpose, the present application adopts the following technical scheme:

a grid event classified search recommendation method comprises the following steps:

dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories;

converting the secondary category information in the text classification task into a vector by using a sentence vector model, and storing the vector in a vector retrieval library;

for a given input text, converting the text into vector representation through a sentence vector model, and recalling a plurality of categories with highest cosine similarity from a vector retrieval library;

after the characteristics of the plurality of category texts and the input text are constructed, inputting a sorting model to obtain sorting scores of the categories to obtain a secondary category;

if the secondary category exists in the tertiary categories, obtaining tertiary categories by adopting a keyword matching technology based on the obtained secondary categories;

and reversely determining the first class based on the obtained second class and the third class.

Further, the grid event level is constructed as follows: the theme is used as a first class, the event name is used as a second class, and the third class is refinement of the second class and indicates urban components where the event occurs.

Further, the sentence vector model is obtained based on SimCSE contrast learning framework training, and is specifically as follows:

the pre-training model of the SimCSE selects a rock qa-zh-duplex-query-encoder, the input of the double towers is text and corresponding categories respectively, the training strategy selects an In-batch negative strategy, namely the categories of other sample pairs In the same batch are used as negative samples, sentence vector representation is learned by maximizing the similarity between positive sample pairs and minimizing the similarity between the negative sample pairs, and the overall optimization formula is as follows:

wherein ,for a sample pair, ++>Representing other tags within one batch, < ->Is a preset parameter;

for the no sample class, a positive sample pair is constructed by adding noise to the text using the Dropout layer, the optimization formula of which is as follows:

。

further, the training data construction of the sentence vector model is specifically as follows:

for the class with the marked sample, adopting the sample < event title >, event content > and the corresponding secondary class as positive samples;

for the class of the unlabeled sample, the secondary class is used as a positive sample with itself.

Further, the sorting model is constructed based on an XGBoost tree model, and specifically comprises the following steps:

based on the cosine distance represented by the sentence vectors and the difference of the sentence vectors, the Jacquard distance between the text and the category, BM25 similarity and cosine similarity calculated by using a Word2Vec model are added, and 772-dimensional ordering characteristics are obtained;

taking the actual category of the text as a positive sample, marking 1, taking other recall results as negative samples, marking 0, and obtaining a sequencing label;

based on the ordering characteristics and ordering labels, XGBoost builds a tree as follows:

(1) Traversing each ordering feature, taking different segmentation points on each ordering feature, calculating gain, and finding the segmentation point with the maximum gain for splitting, wherein the gain calculation formula is as follows:

wherein , and />The first and second derivatives of the loss function are indicated, respectively, subscript +.> and />Representing left and right subtrees, ">For regular term coefficients, ++>For sample real label->For the prediction probability, the prediction probability per sample at the time of constructing the first tree +.>Are all 0.5;

(2) Repeating the steps for the split nodes until the depth of the tree reaches a specified threshold, and stopping the tree, wherein the leaf node value of the tree is calculated according to the following formula:

(3) Adding the next tree, and updating when calculating the gainThe prediction probability of the last tree for the sample:

after training, the following formula is adopted for calculating the prediction result of one sample:

wherein ) Representing sample x _i And predicting results in a kth tree.

Further, the objective function of the kth tree of the XGBoost tree model is as follows:

where Ω represents the complexity of the tree,represents the kth tree,/, and>for sample real label->In order to predict the probability of a probability,representing sample characteristics.

Further, the keyword matching technology adopts an AC automaton, and is specifically as follows:

constructing a dictionary tree, wherein the dictionary tree consists of keywords and three-level categories, and the keywords of each three-level category are obtained through historical data or artificial data;

when the keywords are matched, the keywords of the three-level class of the urban part class are extracted from the event title and the event content; keywords for the tertiary category of venue location class are extracted from event titles, event content, and event places.

A grid event classified search recommendation system comprises a user terminal, a text preprocessing module, a text vectorization module, a text search recommendation module and a vector retrieval library;

dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories, converting the two-level category information in the text classification task into vectors by using a sentence vector model, and storing the vectors in a vector retrieval library;

a user inputs a text at a user end, a text preprocessing module splices an event title and event content, and meaningless characters in the text are removed;

the text vectorization module calls a sentence vector model to convert the predicted text into a vector;

the text search recommendation module calls a vector retrieval service, recalls the second class, acquires the class with the top ranking, constructs ordering characteristics for the recalled second class, and calls XGBoost for ordering to acquire the class with the top ranking; if the second class exists in the third class, executing a keyword matching algorithm based on the AC automaton, and obtaining the third class if matching is successful; and finally, acquiring a first class according to the mapping relation of the first class and the second class, and returning the first class, the second class and the third class and the classification confidence.

The application has the following beneficial effects:

according to the method, the appropriate event category is automatically identified and recommended to the user by applying the search recommendation algorithm, so that the accuracy and the efficiency of event category selection are improved;

the application has strong expansibility, and the model does not need to be retrained when the secondary category is newly added or reduced;

according to the application, when the training set has a class without a sample, the SimCSE model can still construct a positive sample of the class to train the class, so that the classification accuracy is improved.

Drawings

FIG. 1 is a schematic flow chart of the method of the application.

Detailed Description

The application is described in further detail below with reference to the attached drawings and specific examples:

referring to fig. 1, in this embodiment, a grid event classification search recommendation method is provided, which includes the following steps:

In this embodiment, the grid event level is constructed as follows:

the first class is large topics such as "urban environment", "street order";

the second class is event name, such as broken pole, missing pole, obvious rust of box, and out-of-store operation;

the third class is refinement of the second class, namely, indicates urban parts, places and the like where the event occurs, and the third class of 'the broken pole, the missing pole and the obvious corrosion of the box body' is a specific pole such as 'the electric pole', 'the communication pole'.

Examples are as follows:

in this embodiment, the sentence vector model is obtained based on SimCSE contrast learning framework training, and specifically includes the following steps:

wherein ,for a sample pair, ++>Representing the other tag +.within one batch>Is a preset parameter;

。

for the class of the unlabeled sample, adopting the secondary class as a positive sample with the class;

thus, the sample-less class may also be trained to increase the similarity distance to other classes.

Training data construction examples are as follows:

the classes of the marked samples are as follows: collectingSamples (event title + event content) and corresponding secondary categories are used as positive samples.

Class of unlabeled samples: the secondary class is used as a positive sample with itself.

The noted positive sample data is as follows:

the unlabeled positive sample data are as follows:

in this embodiment, the ranking model is built based on XGBoost tree model, specifically as follows:

and (3) sequencing feature construction: based on cosine distance (1D) represented by sentence vectors, the difference (768D) between the sentence vectors, the Jacquard distance (1D) between the text and the category, BM25 similarity (1D) and cosine similarity (1D) calculated by using Word2Vec model are added, and the above features are spliced to obtain 772D ordering features;

wherein ) Representing sample x _i And predicting results in a kth tree.

In this embodiment, the XGBoost tree model, the objective function of the K-th tree is as follows:

In this embodiment, the keyword matching technique adopts an AC automaton, which is specifically as follows:

the dictionary tree is constructed, the dictionary tree is composed of keywords and three levels of categories, the keywords of each three levels of categories are obtained through historical data or artificial data, and for example, the keywords of the corresponding three levels of categories of communication well covers are mobile well covers, telecommunication well covers, communication well covers and the like.

In this embodiment, a grid event classification search recommendation system is also provided, including a user terminal, a text preprocessing module, a text vectorization module, a text search recommendation module, and a vector search library;

the text search recommendation module calls a vector retrieval service, recalls the second class, acquires the class with the top ranking, constructs ordering characteristics for the recalled second class, and calls XGBoost for ordering to acquire the class with the top ranking; if the second class exists in the third class, executing a keyword matching algorithm based on the AC automaton, and obtaining the third class if matching is successful;

urban parts class, based on the event title "+" event content ", match;

a location class matching based on the "event title" + "event content" + "event place;

and finally, acquiring a first class according to the mapping relation of the first class and the second class, and returning the first class, the second class and the third class and the classification confidence.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the application in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present application still fall within the protection scope of the technical solution of the present application.

Claims

1. A grid event classification search recommendation method, comprising the steps of:

2. The grid event classification search recommendation method according to claim 1, wherein the grid event level is constructed as follows: the theme is used as a first class, the event name is used as a second class, and the third class is refinement of the second class and indicates urban components where the event occurs.

3. The grid event classification search recommendation method according to claim 1, wherein the sentence vector model is obtained based on SimCSE contrast learning framework training, specifically comprising the following steps:

；

。

4. the grid event classification search recommendation method of claim 3, wherein the training data construction of the sentence vector model is specifically as follows:

5. The grid event classification search recommendation method according to claim 1, wherein the ranking model is constructed based on an XGBoost tree model, specifically comprising the following steps:

；

wherein , and />The first and second derivatives of the loss function are indicated, respectively, subscript +.> and />Representing the left and right sub-trees,for regular term coefficients, ++>For sample real label->For the prediction probability, the prediction probability per sample at the time of constructing the first tree +.>Are all 0.5;

；

wherein ) Representing sample x _i And predicting results in a kth tree.

6. The grid event classification search recommendation method of claim 5, wherein the XGBoost tree model, the objective function of the kth tree is as follows:

；

where Ω represents the complexity of the tree,represents the kth tree,/, and>for sample real label->For predicting probability +.>Representing sample characteristics.

7. The grid event classification search recommendation method of claim 1, wherein the keyword matching technique adopts an AC automaton, specifically comprising the following steps:

8. A system for a grid event classification search recommendation method according to any of claims 1-7, comprising a user side, a text preprocessing module, a text vectorization module, a text search recommendation module, and a vector search library;