CN116910377A - Grid event classified search recommendation method and system - Google Patents

Grid event classified search recommendation method and system Download PDF

Info

Publication number
CN116910377A
CN116910377A CN202311185198.0A CN202311185198A CN116910377A CN 116910377 A CN116910377 A CN 116910377A CN 202311185198 A CN202311185198 A CN 202311185198A CN 116910377 A CN116910377 A CN 116910377A
Authority
CN
China
Prior art keywords
class
text
categories
event
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311185198.0A
Other languages
Chinese (zh)
Other versions
CN116910377B (en
Inventor
林韶军
黄炳裕
戴文艳
何亦龙
倪坤
黄河
叶威鑫
刘骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Evecom Information Technology Development Co ltd
Original Assignee
Evecom Information Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Evecom Information Technology Development Co ltd filed Critical Evecom Information Technology Development Co ltd
Priority to CN202311185198.0A priority Critical patent/CN116910377B/en
Publication of CN116910377A publication Critical patent/CN116910377A/en
Application granted granted Critical
Publication of CN116910377B publication Critical patent/CN116910377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a grid event classified search recommendation method, which comprises the following steps: dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories; converting the secondary category information in the text classification task into a vector by using a sentence vector model, and storing the vector in a vector retrieval library; for a given input text, converting the text into vector representation through a sentence vector model, and recalling a plurality of categories with highest cosine similarity from a vector retrieval library; after the characteristics of the plurality of category texts and the input text are constructed, inputting a sorting model to obtain sorting scores of the categories to obtain a secondary category; if the secondary category exists in the tertiary categories, obtaining tertiary categories by adopting a keyword matching technology based on the obtained secondary categories; and reversely determining the first class based on the obtained second class and the third class. According to the method and the device, the proper event category is automatically identified and recommended to the user by applying the search recommendation algorithm, so that the accuracy and the efficiency of event category selection are improved.

Description

Grid event classified search recommendation method and system
Technical Field
The application relates to the field of grid event management, in particular to a grid event classified searching and recommending method and system.
Background
Urban grid events are very heterogeneous, typically in multiple levels, potentially covering hundreds of different categories. When creating grid events, users face a large number of category choices, making it difficult to efficiently and accurately select the correct event category.
Currently existing grid event classification methods typically employ a method of fine-tuning downstream tasks using a pre-trained language model. One common approach is to use a pre-trained BERT model as a basis, on which a fully connected layer for classification is added, building a classification model. In the training process, the effect of event classification is achieved by fine adjustment of model parameters.
The main idea of this method is to use the semantic representation capability of the pre-trained language model to apply it to specific grid event classification tasks. By fine tuning on the basis of a pre-trained model, the model can learn the features and semantic information related to event classification. And then, carrying out specific classification operation by adding a full connection layer, and finally obtaining a prediction result of the event category.
However, the method based on the fine tuning of the pre-training model cannot be trained on the category without samples, and in the scene of needing to newly add or reduce the category, the method based on the fine tuning of the pre-training model needs to retrain the whole model, so that the expansibility is poor; and the text content information of the grid event category is rich, and in the method of fine tuning the pre-training model, the category is usually represented by using a one-hot coding form. The encoding mode cannot fully utilize semantic information of the category content of the grid event, so that the model is difficult to accurately capture association and semantic features between the event and the category, and the accuracy of classification is affected.
Disclosure of Invention
In order to solve the above problems, an object of the present application is to provide a grid event classification search recommendation method, which automatically identifies and recommends a suitable event category to a user by applying a search recommendation algorithm, thereby improving the accuracy and efficiency of event category selection.
In order to achieve the above purpose, the present application adopts the following technical scheme:
a grid event classified search recommendation method comprises the following steps:
dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories;
converting the secondary category information in the text classification task into a vector by using a sentence vector model, and storing the vector in a vector retrieval library;
for a given input text, converting the text into vector representation through a sentence vector model, and recalling a plurality of categories with highest cosine similarity from a vector retrieval library;
after the characteristics of the plurality of category texts and the input text are constructed, inputting a sorting model to obtain sorting scores of the categories to obtain a secondary category;
if the secondary category exists in the tertiary categories, obtaining tertiary categories by adopting a keyword matching technology based on the obtained secondary categories;
and reversely determining the first class based on the obtained second class and the third class.
Further, the grid event level is constructed as follows: the theme is used as a first class, the event name is used as a second class, and the third class is refinement of the second class and indicates urban components where the event occurs.
Further, the sentence vector model is obtained based on SimCSE contrast learning framework training, and is specifically as follows:
the pre-training model of the SimCSE selects a rock qa-zh-duplex-query-encoder, the input of the double towers is text and corresponding categories respectively, the training strategy selects an In-batch negative strategy, namely the categories of other sample pairs In the same batch are used as negative samples, sentence vector representation is learned by maximizing the similarity between positive sample pairs and minimizing the similarity between the negative sample pairs, and the overall optimization formula is as follows:
wherein ,for a sample pair, ++>Representing other tags within one batch, < ->Is a preset parameter;
for the no sample class, a positive sample pair is constructed by adding noise to the text using the Dropout layer, the optimization formula of which is as follows:
further, the training data construction of the sentence vector model is specifically as follows:
for the class with the marked sample, adopting the sample < event title >, event content > and the corresponding secondary class as positive samples;
for the class of the unlabeled sample, the secondary class is used as a positive sample with itself.
Further, the sorting model is constructed based on an XGBoost tree model, and specifically comprises the following steps:
based on the cosine distance represented by the sentence vectors and the difference of the sentence vectors, the Jacquard distance between the text and the category, BM25 similarity and cosine similarity calculated by using a Word2Vec model are added, and 772-dimensional ordering characteristics are obtained;
taking the actual category of the text as a positive sample, marking 1, taking other recall results as negative samples, marking 0, and obtaining a sequencing label;
based on the ordering characteristics and ordering labels, XGBoost builds a tree as follows:
(1) Traversing each ordering feature, taking different segmentation points on each ordering feature, calculating gain, and finding the segmentation point with the maximum gain for splitting, wherein the gain calculation formula is as follows:
wherein , and />The first and second derivatives of the loss function are indicated, respectively, subscript +.> and />Representing left and right subtrees, ">For regular term coefficients, ++>For sample real label->For the prediction probability, the prediction probability per sample at the time of constructing the first tree +.>Are all 0.5;
(2) Repeating the steps for the split nodes until the depth of the tree reaches a specified threshold, and stopping the tree, wherein the leaf node value of the tree is calculated according to the following formula:
(3) Adding the next tree, and updating when calculating the gainThe prediction probability of the last tree for the sample:
after training, the following formula is adopted for calculating the prediction result of one sample:
wherein ) Representing sample x i And predicting results in a kth tree.
Further, the objective function of the kth tree of the XGBoost tree model is as follows:
where Ω represents the complexity of the tree,represents the kth tree,/, and>for sample real label->In order to predict the probability of a probability,representing sample characteristics.
Further, the keyword matching technology adopts an AC automaton, and is specifically as follows:
constructing a dictionary tree, wherein the dictionary tree consists of keywords and three-level categories, and the keywords of each three-level category are obtained through historical data or artificial data;
when the keywords are matched, the keywords of the three-level class of the urban part class are extracted from the event title and the event content; keywords for the tertiary category of venue location class are extracted from event titles, event content, and event places.
A grid event classified search recommendation system comprises a user terminal, a text preprocessing module, a text vectorization module, a text search recommendation module and a vector retrieval library;
dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories, converting the two-level category information in the text classification task into vectors by using a sentence vector model, and storing the vectors in a vector retrieval library;
a user inputs a text at a user end, a text preprocessing module splices an event title and event content, and meaningless characters in the text are removed;
the text vectorization module calls a sentence vector model to convert the predicted text into a vector;
the text search recommendation module calls a vector retrieval service, recalls the second class, acquires the class with the top ranking, constructs ordering characteristics for the recalled second class, and calls XGBoost for ordering to acquire the class with the top ranking; if the second class exists in the third class, executing a keyword matching algorithm based on the AC automaton, and obtaining the third class if matching is successful; and finally, acquiring a first class according to the mapping relation of the first class and the second class, and returning the first class, the second class and the third class and the classification confidence.
The application has the following beneficial effects:
according to the method, the appropriate event category is automatically identified and recommended to the user by applying the search recommendation algorithm, so that the accuracy and the efficiency of event category selection are improved;
the application has strong expansibility, and the model does not need to be retrained when the secondary category is newly added or reduced;
according to the application, when the training set has a class without a sample, the SimCSE model can still construct a positive sample of the class to train the class, so that the classification accuracy is improved.
Drawings
FIG. 1 is a schematic flow chart of the method of the application.
Detailed Description
The application is described in further detail below with reference to the attached drawings and specific examples:
referring to fig. 1, in this embodiment, a grid event classification search recommendation method is provided, which includes the following steps:
dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories;
converting the secondary category information in the text classification task into a vector by using a sentence vector model, and storing the vector in a vector retrieval library;
for a given input text, converting the text into vector representation through a sentence vector model, and recalling a plurality of categories with highest cosine similarity from a vector retrieval library;
after the characteristics of the plurality of category texts and the input text are constructed, inputting a sorting model to obtain sorting scores of the categories to obtain a secondary category;
if the secondary category exists in the tertiary categories, obtaining tertiary categories by adopting a keyword matching technology based on the obtained secondary categories;
and reversely determining the first class based on the obtained second class and the third class.
In this embodiment, the grid event level is constructed as follows:
the first class is large topics such as "urban environment", "street order";
the second class is event name, such as broken pole, missing pole, obvious rust of box, and out-of-store operation;
the third class is refinement of the second class, namely, indicates urban parts, places and the like where the event occurs, and the third class of 'the broken pole, the missing pole and the obvious corrosion of the box body' is a specific pole such as 'the electric pole', 'the communication pole'.
Examples are as follows:
in this embodiment, the sentence vector model is obtained based on SimCSE contrast learning framework training, and specifically includes the following steps:
the pre-training model of the SimCSE selects a rock qa-zh-duplex-query-encoder, the input of the double towers is text and corresponding categories respectively, the training strategy selects an In-batch negative strategy, namely the categories of other sample pairs In the same batch are used as negative samples, sentence vector representation is learned by maximizing the similarity between positive sample pairs and minimizing the similarity between the negative sample pairs, and the overall optimization formula is as follows:
wherein ,for a sample pair, ++>Representing the other tag +.within one batch>Is a preset parameter;
for the no sample class, a positive sample pair is constructed by adding noise to the text using the Dropout layer, the optimization formula of which is as follows:
further, the training data construction of the sentence vector model is specifically as follows:
for the class with the marked sample, adopting the sample < event title >, event content > and the corresponding secondary class as positive samples;
for the class of the unlabeled sample, adopting the secondary class as a positive sample with the class;
thus, the sample-less class may also be trained to increase the similarity distance to other classes.
Training data construction examples are as follows:
the classes of the marked samples are as follows: collectingSamples (event title + event content) and corresponding secondary categories are used as positive samples.
Class of unlabeled samples: the secondary class is used as a positive sample with itself.
The noted positive sample data is as follows:
the unlabeled positive sample data are as follows:
in this embodiment, the ranking model is built based on XGBoost tree model, specifically as follows:
and (3) sequencing feature construction: based on cosine distance (1D) represented by sentence vectors, the difference (768D) between the sentence vectors, the Jacquard distance (1D) between the text and the category, BM25 similarity (1D) and cosine similarity (1D) calculated by using Word2Vec model are added, and the above features are spliced to obtain 772D ordering features;
taking the actual category of the text as a positive sample, marking 1, taking other recall results as negative samples, marking 0, and obtaining a sequencing label;
based on the ordering characteristics and ordering labels, XGBoost builds a tree as follows:
(1) Traversing each ordering feature, taking different segmentation points on each ordering feature, calculating gain, and finding the segmentation point with the maximum gain for splitting, wherein the gain calculation formula is as follows:
wherein , and />The first and second derivatives of the loss function are indicated, respectively, subscript +.> and />Representing left and right subtrees, ">For regular term coefficients, ++>For sample real label->For the prediction probability, the prediction probability per sample at the time of constructing the first tree +.>Are all 0.5;
(2) Repeating the steps for the split nodes until the depth of the tree reaches a specified threshold, and stopping the tree, wherein the leaf node value of the tree is calculated according to the following formula:
(3) Adding the next tree, and updating when calculating the gainThe prediction probability of the last tree for the sample:
after training, the following formula is adopted for calculating the prediction result of one sample:
wherein ) Representing sample x i And predicting results in a kth tree.
In this embodiment, the XGBoost tree model, the objective function of the K-th tree is as follows:
where Ω represents the complexity of the tree,represents the kth tree,/, and>for sample real label->In order to predict the probability of a probability,representing sample characteristics.
In this embodiment, the keyword matching technique adopts an AC automaton, which is specifically as follows:
the dictionary tree is constructed, the dictionary tree is composed of keywords and three levels of categories, the keywords of each three levels of categories are obtained through historical data or artificial data, and for example, the keywords of the corresponding three levels of categories of communication well covers are mobile well covers, telecommunication well covers, communication well covers and the like.
When the keywords are matched, the keywords of the three-level class of the urban part class are extracted from the event title and the event content; keywords for the tertiary category of venue location class are extracted from event titles, event content, and event places.
In this embodiment, a grid event classification search recommendation system is also provided, including a user terminal, a text preprocessing module, a text vectorization module, a text search recommendation module, and a vector search library;
dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories, converting the two-level category information in the text classification task into vectors by using a sentence vector model, and storing the vectors in a vector retrieval library;
a user inputs a text at a user end, a text preprocessing module splices an event title and event content, and meaningless characters in the text are removed;
the text vectorization module calls a sentence vector model to convert the predicted text into a vector;
the text search recommendation module calls a vector retrieval service, recalls the second class, acquires the class with the top ranking, constructs ordering characteristics for the recalled second class, and calls XGBoost for ordering to acquire the class with the top ranking; if the second class exists in the third class, executing a keyword matching algorithm based on the AC automaton, and obtaining the third class if matching is successful;
urban parts class, based on the event title "+" event content ", match;
a location class matching based on the "event title" + "event content" + "event place;
and finally, acquiring a first class according to the mapping relation of the first class and the second class, and returning the first class, the second class and the third class and the classification confidence.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application, and is not intended to limit the application in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present application still fall within the protection scope of the technical solution of the present application.

Claims (8)

1. A grid event classification search recommendation method, comprising the steps of:
dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories;
converting the secondary category information in the text classification task into a vector by using a sentence vector model, and storing the vector in a vector retrieval library;
for a given input text, converting the text into vector representation through a sentence vector model, and recalling a plurality of categories with highest cosine similarity from a vector retrieval library;
after the characteristics of the plurality of category texts and the input text are constructed, inputting a sorting model to obtain sorting scores of the categories to obtain a secondary category;
if the secondary category exists in the tertiary categories, obtaining tertiary categories by adopting a keyword matching technology based on the obtained secondary categories;
and reversely determining the first class based on the obtained second class and the third class.
2. The grid event classification search recommendation method according to claim 1, wherein the grid event level is constructed as follows: the theme is used as a first class, the event name is used as a second class, and the third class is refinement of the second class and indicates urban components where the event occurs.
3. The grid event classification search recommendation method according to claim 1, wherein the sentence vector model is obtained based on SimCSE contrast learning framework training, specifically comprising the following steps:
the pre-training model of the SimCSE selects a rock qa-zh-duplex-query-encoder, the input of the double towers is text and corresponding categories respectively, the training strategy selects an In-batch negative strategy, namely the categories of other sample pairs In the same batch are used as negative samples, sentence vector representation is learned by maximizing the similarity between positive sample pairs and minimizing the similarity between the negative sample pairs, and the overall optimization formula is as follows:
wherein ,for a sample pair, ++>Representing other tags within one batch, < ->Is a preset parameter;
for the no sample class, a positive sample pair is constructed by adding noise to the text using the Dropout layer, the optimization formula of which is as follows:
4. the grid event classification search recommendation method of claim 3, wherein the training data construction of the sentence vector model is specifically as follows:
for the class with the marked sample, adopting the sample < event title >, event content > and the corresponding secondary class as positive samples;
for the class of the unlabeled sample, the secondary class is used as a positive sample with itself.
5. The grid event classification search recommendation method according to claim 1, wherein the ranking model is constructed based on an XGBoost tree model, specifically comprising the following steps:
based on the cosine distance represented by the sentence vectors and the difference of the sentence vectors, the Jacquard distance between the text and the category, BM25 similarity and cosine similarity calculated by using a Word2Vec model are added, and 772-dimensional ordering characteristics are obtained;
taking the actual category of the text as a positive sample, marking 1, taking other recall results as negative samples, marking 0, and obtaining a sequencing label;
based on the ordering characteristics and ordering labels, XGBoost builds a tree as follows:
(1) Traversing each ordering feature, taking different segmentation points on each ordering feature, calculating gain, and finding the segmentation point with the maximum gain for splitting, wherein the gain calculation formula is as follows:
wherein , and />The first and second derivatives of the loss function are indicated, respectively, subscript +.> and />Representing the left and right sub-trees,for regular term coefficients, ++>For sample real label->For the prediction probability, the prediction probability per sample at the time of constructing the first tree +.>Are all 0.5;
(2) Repeating the steps for the split nodes until the depth of the tree reaches a specified threshold, and stopping the tree, wherein the leaf node value of the tree is calculated according to the following formula:
(3) Adding the next tree, and updating when calculating the gainThe prediction probability of the last tree for the sample:
after training, the following formula is adopted for calculating the prediction result of one sample:
wherein ) Representing sample x i And predicting results in a kth tree.
6. The grid event classification search recommendation method of claim 5, wherein the XGBoost tree model, the objective function of the kth tree is as follows:
where Ω represents the complexity of the tree,represents the kth tree,/, and>for sample real label->For predicting probability +.>Representing sample characteristics.
7. The grid event classification search recommendation method of claim 1, wherein the keyword matching technique adopts an AC automaton, specifically comprising the following steps:
constructing a dictionary tree, wherein the dictionary tree consists of keywords and three-level categories, and the keywords of each three-level category are obtained through historical data or artificial data;
when the keywords are matched, the keywords of the three-level class of the urban part class are extracted from the event title and the event content; keywords for the tertiary category of venue location class are extracted from event titles, event content, and event places.
8. A system for a grid event classification search recommendation method according to any of claims 1-7, comprising a user side, a text preprocessing module, a text vectorization module, a text search recommendation module, and a vector search library;
dividing the grid event into three levels, wherein part of the two-level categories comprise three-level categories, converting the two-level category information in the text classification task into vectors by using a sentence vector model, and storing the vectors in a vector retrieval library;
a user inputs a text at a user end, a text preprocessing module splices an event title and event content, and meaningless characters in the text are removed;
the text vectorization module calls a sentence vector model to convert the predicted text into a vector;
the text search recommendation module calls a vector retrieval service, recalls the second class, acquires the class with the top ranking, constructs ordering characteristics for the recalled second class, and calls XGBoost for ordering to acquire the class with the top ranking; if the second class exists in the third class, executing a keyword matching algorithm based on the AC automaton, and obtaining the third class if matching is successful; and finally, acquiring a first class according to the mapping relation of the first class and the second class, and returning the first class, the second class and the third class and the classification confidence.
CN202311185198.0A 2023-09-14 2023-09-14 Grid event classified search recommendation method and system Active CN116910377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311185198.0A CN116910377B (en) 2023-09-14 2023-09-14 Grid event classified search recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311185198.0A CN116910377B (en) 2023-09-14 2023-09-14 Grid event classified search recommendation method and system

Publications (2)

Publication Number Publication Date
CN116910377A true CN116910377A (en) 2023-10-20
CN116910377B CN116910377B (en) 2023-12-08

Family

ID=88363392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311185198.0A Active CN116910377B (en) 2023-09-14 2023-09-14 Grid event classified search recommendation method and system

Country Status (1)

Country Link
CN (1) CN116910377B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020224097A1 (en) * 2019-05-06 2020-11-12 平安科技(深圳)有限公司 Intelligent semantic document recommendation method and device, and computer-readable storage medium
CN113869060A (en) * 2021-09-23 2021-12-31 北京百度网讯科技有限公司 Semantic data processing method and search method and device
CN115204318A (en) * 2022-09-15 2022-10-18 天津汇智星源信息技术有限公司 Event automatic hierarchical classification method and electronic equipment
CN115408525A (en) * 2022-09-29 2022-11-29 中电科新型智慧城市研究院有限公司 Petition text classification method, device, equipment and medium based on multi-level label
CN115617994A (en) * 2022-10-13 2023-01-17 四川川云智慧智能科技有限公司 Transformer substation equipment defect type identification method and system
CN116150335A (en) * 2022-12-19 2023-05-23 中国电子科技集团公司第二十八研究所 Text semantic retrieval method under military scene
WO2023134084A1 (en) * 2022-01-11 2023-07-20 平安科技(深圳)有限公司 Multi-label identification method and apparatus, electronic device, and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020224097A1 (en) * 2019-05-06 2020-11-12 平安科技(深圳)有限公司 Intelligent semantic document recommendation method and device, and computer-readable storage medium
CN113869060A (en) * 2021-09-23 2021-12-31 北京百度网讯科技有限公司 Semantic data processing method and search method and device
WO2023134084A1 (en) * 2022-01-11 2023-07-20 平安科技(深圳)有限公司 Multi-label identification method and apparatus, electronic device, and storage medium
CN115204318A (en) * 2022-09-15 2022-10-18 天津汇智星源信息技术有限公司 Event automatic hierarchical classification method and electronic equipment
CN115408525A (en) * 2022-09-29 2022-11-29 中电科新型智慧城市研究院有限公司 Petition text classification method, device, equipment and medium based on multi-level label
CN115617994A (en) * 2022-10-13 2023-01-17 四川川云智慧智能科技有限公司 Transformer substation equipment defect type identification method and system
CN116150335A (en) * 2022-12-19 2023-05-23 中国电子科技集团公司第二十八研究所 Text semantic retrieval method under military scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王青天等: "《Python金融大数据风控建模实战:基于机器学习》", 机械工业出版社, pages: 297 - 303 *

Also Published As

Publication number Publication date
CN116910377B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN108897857B (en) Chinese text subject sentence generating method facing field
CN114020862B (en) Search type intelligent question-answering system and method for coal mine safety regulations
CN112800170A (en) Question matching method and device and question reply method and device
CN110162591B (en) Entity alignment method and system for digital education resources
CN113672708A (en) Language model training method, question and answer pair generation method, device and equipment
CN115587175A (en) Man-machine conversation and pre-training language model training method and system and electronic equipment
CN111078546B (en) Page feature expression method and electronic equipment
CN111881264B (en) Method and electronic equipment for searching long text in question-answering task in open field
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN114547370A (en) Video abstract extraction method and system
CN117609421A (en) Electric power professional knowledge intelligent question-answering system construction method based on large language model
CN115115984A (en) Video data processing method, apparatus, program product, computer device, and medium
CN117828024A (en) Plug-in retrieval method, device, storage medium and equipment
CN117474010A (en) Power grid language model-oriented power transmission and transformation equipment defect corpus construction method
CN116910377B (en) Grid event classified search recommendation method and system
CN114254657B (en) Translation method and related equipment thereof
CN115905585A (en) Keyword and text matching method and device, electronic equipment and storage medium
CN112749530B (en) Text encoding method, apparatus, device and computer readable storage medium
CN115270774A (en) Big data keyword dictionary construction method for semi-supervised learning
CN115098687A (en) Alarm checking method and device for scheduling operation of electric power SDH optical transmission system
CN114925681A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN112287690A (en) Sign language translation method based on conditional sentence generation and cross-modal rearrangement
CN114036946B (en) Text feature extraction and auxiliary retrieval system and method
CN117575026B (en) Large model reasoning analysis method, system and product based on external knowledge enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant