CN116450777A - Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis - Google Patents

Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis Download PDF

Info

Publication number
CN116450777A
CN116450777A CN202310463901.3A CN202310463901A CN116450777A CN 116450777 A CN116450777 A CN 116450777A CN 202310463901 A CN202310463901 A CN 202310463901A CN 116450777 A CN116450777 A CN 116450777A
Authority
CN
China
Prior art keywords
word
appeal
hot
hot spot
spot word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310463901.3A
Other languages
Chinese (zh)
Inventor
殷蓓
夏琳慜
高淑婷
吕湛
祁伟
高敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202310463901.3A priority Critical patent/CN116450777A/en
Publication of CN116450777A publication Critical patent/CN116450777A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a power consumption appeal analysis method and system based on NLP and hot spot word analysis, and belongs to the technical field of power data processing. According to the method, the potential hot-spot word elements of the electricity utilization claim worksheet and the multidimensional hot-spot word element set are selected through screening, namely the dimension combination of the electricity utilization claim standard hot-spot word set, the dialect word set representing the meaning of the standard hot-spot word and the homonym set is used, the relevance is calculated, the potential hot-spot word elements larger than the relevance threshold value are set as keywords, then the relevance of the key word spliced sentence and the claim in the claim classification library is calculated, the claim with the highest relevance is used as the electricity utilization claim intention of the user, the problem that the electricity utilization claim worksheet intention is not accurately identified due to the fact that the dialect and the homonym are not identified in the prior art is solved, and the resolution effect of the electricity utilization claim of the user can be improved.

Description

Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis
Technical Field
The invention relates to a power consumption appeal analysis method and system based on NLP and hot spot word analysis, and belongs to the technical field of power data processing.
Background
The power supply service command center receives a large number of non-rush repair type complaints work orders, millions of users, the power consumption environment and the power consumption complaints of the users are five-flower eight-door, 7 major categories and 272 minor categories, and the users are primarily classified by means of customer service center marking work order types. However, a large number of old people exist in the user, emotion factors are often added when the work order is filled, so that dialect vocabulary is generated on the work order and the spoken language is serious, in addition, the condition that homonyms are misplaced on the work order due to negligence in filling exists on the work order, the manual resolution difficulty of customer service is increased, the work order intention is not accurately identified, the influence on the fine processing of the work order is large, and the complaint risk of the user is increased.
Disclosure of Invention
The technical problem of the invention is to solve the problem that the resolution accuracy of electricity utilization complaints is affected by the words and wrongly written words in the work order.
In order to solve the technical problems, a first technical scheme provided by the invention is as follows: an electricity consumption appeal analysis method based on NLP and hot spot word analysis, the method comprises the following steps:
s1, extracting characters on a power consumption requirement work order to obtain a character set;
s2, segmenting the text set through context semantics to obtain basic lemmas, and comparing and combining the basic lemmas obtained by segmentation with professional lemmas to obtain professional basic lemmas;
s3, screening potential hot spot word elements which are meaning representations or standard hot spot word elements in professional basic word elements, and performing correlation calculation with a multidimensional hot spot word element set, wherein the potential hot spot word elements which are larger than a correlation threshold value are used as key words; the standard hot spot vocabulary element is given by expert in the field;
s4, splicing the keywords according to different word orders to obtain a keyword spliced sentence set;
s5, calculating the association degree of each item in the keyword spliced sentence set and each requirement in the requirement classification library one by one, and taking one requirement in the requirement classification library corresponding to the maximum association degree as a power consumption requirement analysis result; the appeal classification library is a keyword classification library obtained by extracting natural text paragraphs of accepted contents from classified historical electricity utilization appeal files, performing lexical and grammatical analysis by using NLP natural language, and clustering.
Further, the expression of the multidimensional hotspot word set in step S3 is n= (X, X (C) s ),X(O m ) Where X is the standard hot-spot vocabulary of electricity complaints, X (C) s ) And X (O) m ) And respectively a dialect word set and a homonym word set corresponding to the standard hot word set.
Further, the correlation calculation formula in step S3 is represented by the following formula (1),
in the formula (1), a i Taking 1 for the frequency of occurrence of the ith word element in the potential hot-spot word element; b i Is the frequency with which the ith token appears in the multi-dimensional hotspot token set.
Further, the association degree calculation formula in step S5 is represented by the following formula (2),
in the formula (2), a i The frequency of occurrence of the ith keyword in the keyword spliced sentence set is used; b j The frequency of occurrence of the j-th keyword in a single claim of the claim classification library; n1 represents the number of keywords in a single vector of the keyword-spliced sentence; n3 represents the number of keywords in a single vector of claim data.
In order to solve the technical problems, a second technical scheme provided by the invention is as follows: an electricity consumption appeal analysis system based on NLP and hot spot word analysis comprises the following modules:
the appeal extraction module is used for extracting characters on a user appeal work order to obtain a character set;
the character recognition module is used for dividing the character set through context semantics to obtain basic character elements, and comparing and combining the basic character elements obtained by division with professional character elements to obtain professional basic character elements;
and the keyword extraction module is used for screening potential hot spot word elements which are meaning representations or standard hot spot word elements in the professional basic word elements, carrying out correlation calculation with the multidimensional hot spot word element set, and taking the potential hot spot word elements which are larger than a correlation threshold value as keywords.
The keyword splicing module is used for splicing keywords according to different word orders to obtain a plurality of keyword spliced sentences;
the appeal resolving module is used for calculating the degree of association between each item in the keyword spliced sentence set and the appeal in the appeal classification library, and taking the single appeal in the appeal classification library corresponding to the maximum degree of association as the electricity consumption appeal;
the database module is used for storing multi-dimensional hot-spot word stock data consisting of appeal classification base data, standard hot-spot word data, and dialect and homonym data representing standard hot-spot word meanings in a classified mode.
From the above technical scheme, the invention has the following advantages:
according to the method, the potential hot-spot word elements of the electricity utilization claim worksheet and the multidimensional hot-spot word element set are selected through screening, namely the dimension combination of the electricity utilization claim standard hot-spot word set, the dialect word set representing the meaning of the standard hot-spot word and the homonym set is used, the relevance is calculated, the potential hot-spot word elements larger than the relevance threshold value are set as keywords, then the relevance of the key word spliced sentence and the claim in the claim classification library is calculated, the claim with the highest relevance is used as the electricity utilization claim intention of the user, the problem that the electricity utilization claim worksheet intention is not accurately identified due to the fact that the dialect and the homonym are not accurately identified in the prior art is solved, and the resolution effect of the electricity utilization claim of the user can be improved.
Drawings
Fig. 1 is a flowchart of a power consumption appeal analysis method based on NLP and hot spot word analysis in the present embodiment.
Fig. 2 is a schematic structural diagram of an electricity consumption resolution system based on NLP and hot-spot word analysis in the present embodiment.
Detailed Description
As shown in fig. 1, the following specific description of an embodiment of a power consumer appeal screening method and system provided by the present invention includes the following steps:
s1, extracting characters on a power consumption requirement work order to obtain a character set;
s2, segmenting the text set through context semantics to obtain basic lemmas, and comparing and combining the basic lemmas obtained by segmentation with professional lemmas to obtain professional basic lemmas;
s3, screening potential hot spot word elements which are meaning representations or standard hot spot word elements in professional basic word elements, performing correlation calculation with a multidimensional hot spot word element set, wherein the potential hot spot word elements larger than a correlation threshold value are used as key words, and the standard hot spot word elements are given by experts in the field;
for example, the text on the work order is extracted to obtain a text set "primary wiring circuit trouble shooting" at the A position, the text set is divided by the context semantics to obtain a basic word element "A, position, primary wiring, circuit, trouble shooting and maintenance", wherein the "primary wiring" belongs to a professional word element, the basic word elements are combined, and finally the "A, position, primary wiring, circuit, trouble shooting and maintenance" are obtained. Professional tokens are downloaded in https:// www.taodocs.com/p-282955323.Html web site.
The expert in the field gives out standard hot words of different electricity utilization complaints according to experience, downloads a dialect word dictionary and a homonym dictionary on the network, extracts the dialect words and homonyms representing the meaning of the hot words by using a Query algorithm based on the dictionary, and then uses the electricity utilization complaints to ask for standard hot words, the dialect words and homonyms representing the meaning of the standard hot wordsWord dimension combination to obtain a multidimensional hot spot word element set N= (X, X (C) s ),X(O m ) Where X is the standard hot-spot vocabulary of electricity complaints, X (C) s ) And X (O) m ) And respectively a dialect word set and a homonym word set corresponding to the standard hot words. For example, the kth electricity complaint standard hotspot word x of complaint in the hotspot vocabulary database k Corresponds to s synonymous dialects (x k (C 1 ),x k (C 2 ),...,x k (C s ) And m possible erroneous homophones (x) k (O 1 ),x k (O 2 ),...,x k (O m ))。
And then, calculating the correlation degree between the potential hot spot word elements and the multidimensional hot spot word element set, wherein the calculation formula of the correlation degree is as follows:
wherein a is i Taking 1 for the frequency of occurrence of the ith word element in the potential hot-spot word element; b i Is the frequency with which the ith token appears in the multi-dimensional hotspot token set.
Finally, a correlation threshold value is set, and the correlation I xgd Hot words greater than the threshold are set as keywords. For example, by calculation and screening, a circuit is selected as a keyword among (circuits, pads, electric furnaces, electronic circuits).
S4, splicing the keywords according to different word orders to obtain a keyword spliced sentence set;
s5, calculating the association degree of each item in the keyword spliced sentence set and each requirement in the requirement classification library one by one, taking one requirement in the requirement classification library corresponding to the maximum association degree as a power consumption requirement analysis result, wherein the requirement classification library is a keyword classification library obtained by extracting natural text paragraphs of accepted contents from classified historical power consumption requirement files, performing lexical and grammar analysis by using NLP natural language, and clustering.
Calculating the association degree of each item in the keyword spliced sentence set and each piece of appeal data in the appeal classification library one by one, wherein the association degree calculation formula is as follows:
wherein a is i The frequency of occurrence of the ith keyword in the keyword spliced sentence set is used; b j The frequency of occurrence of the j-th keyword in a single claim of the claim classification library; n1 represents the number of keywords in a single vector of the keyword-spliced sentence; n3 represents the number of keywords in a single vector of claim data.
The embodiment also provides a power consumption appeal analysis system based on NLP and hot spot word analysis, which comprises the following modules:
the appeal extraction module is used for extracting characters on a user appeal work order to obtain a character set;
the character recognition module is used for dividing the character set through context semantics to obtain basic character elements, and comparing and combining the basic character elements obtained by division with professional character elements to obtain professional basic character elements;
the keyword extraction module is used for screening potential hot spot word elements which are meaning representations or standard hot spot word elements in professional basic word elements, carrying out correlation calculation with the multidimensional hot spot word element set, and taking the potential hot spot word elements which are larger than a correlation threshold value as keywords;
the keyword splicing module is used for splicing keywords according to different word orders to obtain a plurality of keyword spliced sentences;
the appeal resolving module is used for calculating the degree of association between each item in the keyword spliced sentence set and the appeal in the appeal classification library, and taking the single appeal in the appeal classification library corresponding to the maximum degree of association as the electricity consumption appeal;
the database module is used for storing multi-dimensional hot-spot word stock data consisting of appeal classification base data, standard hot-spot word data, and dialect and homonym data representing standard hot-spot word meanings in a classified mode.

Claims (5)

1. The electricity consumption appeal resolution method based on NLP and hot spot word analysis is characterized by comprising the following steps of:
s1, extracting characters on a power consumption requirement work order to obtain a character set;
s2, segmenting the text set through context semantics to obtain basic lemmas, and comparing and combining the basic lemmas obtained by segmentation with professional lemmas to obtain professional basic lemmas;
s3, screening potential hot spot word elements which are meaning representations or standard hot spot word elements in professional basic word elements, and performing correlation calculation with a multidimensional hot spot word element set, wherein the potential hot spot word elements which are larger than a correlation threshold value are used as key words; the standard hot spot vocabulary element is given by expert in the field;
s4, splicing the keywords according to different word orders to obtain a keyword spliced sentence set;
s5, calculating the association degree of each item in the keyword spliced sentence set and each requirement in the requirement classification library one by one, and taking one requirement in the requirement classification library corresponding to the maximum association degree as a power consumption requirement analysis result; the appeal classification library is a keyword classification library obtained by extracting natural text paragraphs of accepted contents from classified historical electricity utilization appeal files, performing lexical and grammatical analysis by using NLP natural language, and clustering.
2. The electricity consumption resolution method based on NLP and hot spot word analysis according to claim 1, wherein the expression of the multidimensional hot spot word set in step S3 is n= (X, X (C) s ),X(O m ) Where X is the standard hot-spot vocabulary of electricity complaints, X (C) s ) And X (O) m ) And respectively a dialect word set and a homonym word set corresponding to the standard hot word set.
3. The electricity consumption resolution method according to claim 1, wherein the correlation calculation formula in step S3 is as follows (1),
in the formula (1), a i Taking 1 for the frequency of occurrence of the ith word element in the potential hot-spot word element; b i Is the frequency with which the ith token appears in the multi-dimensional hotspot token set.
4. The electricity consumption resolution method based on NLP and hot-spot word analysis according to claim 1, wherein the association degree calculation formula in step S5 is as follows (2),
in the formula (2), a i The frequency of occurrence of the ith keyword in the keyword spliced sentence set is used; b j The frequency of occurrence of the j-th keyword in a single claim of the claim classification library; n1 represents the number of keywords in a single vector of the keyword-spliced sentence; n3 represents the number of keywords in a single vector of claim data.
5. The electricity consumption appeal analysis system based on NLP and hot spot word analysis is characterized by comprising the following modules:
the appeal extraction module is used for extracting characters on a user appeal work order to obtain a character set;
the character recognition module is used for dividing the character set through context semantics to obtain basic character elements, and comparing and combining the basic character elements obtained by division with professional character elements to obtain professional basic character elements;
the keyword extraction module is used for screening potential hot spot word elements which are meaning representations or standard hot spot word elements in professional basic word elements, carrying out correlation calculation with the multidimensional hot spot word element set, and taking the potential hot spot word elements which are larger than a correlation threshold value as keywords;
the keyword splicing module is used for splicing keywords according to different word orders to obtain a plurality of keyword spliced sentences;
the appeal resolving module is used for calculating the degree of association between each item in the keyword spliced sentence set and the appeal in the appeal classification library, and taking the single appeal in the appeal classification library corresponding to the maximum degree of association as the electricity consumption appeal;
the database module is used for storing multi-dimensional hot-spot word stock data consisting of appeal classification base data, standard hot-spot word data, and dialect and homonym data representing standard hot-spot word meanings in a classified mode.
CN202310463901.3A 2023-04-26 2023-04-26 Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis Pending CN116450777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310463901.3A CN116450777A (en) 2023-04-26 2023-04-26 Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310463901.3A CN116450777A (en) 2023-04-26 2023-04-26 Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis

Publications (1)

Publication Number Publication Date
CN116450777A true CN116450777A (en) 2023-07-18

Family

ID=87123650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310463901.3A Pending CN116450777A (en) 2023-04-26 2023-04-26 Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis

Country Status (1)

Country Link
CN (1) CN116450777A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861915A (en) * 2023-06-02 2023-10-10 国网江苏省电力有限公司南京供电分公司 Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510566A (en) * 2021-11-29 2022-05-17 上海市黄浦区城市运行管理中心(上海市黄浦区城市网格化综合管理中心、上海市黄浦区大数据中心) Hot word mining, classifying and analyzing method and system based on work order
CN114969297A (en) * 2022-06-13 2022-08-30 深圳供电局有限公司 Method for analyzing power customer appeal relevancy
US20230035947A1 (en) * 2019-12-28 2023-02-02 Iflytek Co., Ltd. Voice recognition method and related product
CN115994534A (en) * 2022-12-22 2023-04-21 北京百度网讯科技有限公司 Government scene hot word mining method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230035947A1 (en) * 2019-12-28 2023-02-02 Iflytek Co., Ltd. Voice recognition method and related product
CN114510566A (en) * 2021-11-29 2022-05-17 上海市黄浦区城市运行管理中心(上海市黄浦区城市网格化综合管理中心、上海市黄浦区大数据中心) Hot word mining, classifying and analyzing method and system based on work order
CN114969297A (en) * 2022-06-13 2022-08-30 深圳供电局有限公司 Method for analyzing power customer appeal relevancy
CN115994534A (en) * 2022-12-22 2023-04-21 北京百度网讯科技有限公司 Government scene hot word mining method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李颢;张吉皓;: "基于文本挖掘技术的客服投诉工单自动分类探讨", 移动通信, no. 23, 15 December 2017 (2017-12-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861915A (en) * 2023-06-02 2023-10-10 国网江苏省电力有限公司南京供电分公司 Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis

Similar Documents

Publication Publication Date Title
Jung Semantic vector learning for natural language understanding
CN108304468B (en) Text classification method and text classification device
US7295965B2 (en) Method and apparatus for determining a measure of similarity between natural language sentences
CN111950264B (en) Text data enhancement method and knowledge element extraction method
CN102866989A (en) Viewpoint extracting method based on word dependence relationship
CA2777520A1 (en) System and method for phrase identification
Sharma et al. Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger
Krstev et al. Using textual and lexical resources in developing serbian wordnet
CN115858758A (en) Intelligent customer service knowledge graph system with multiple unstructured data identification
CN111489746A (en) Power grid dispatching voice recognition language model construction method based on BERT
US20100094615A1 (en) Document translation apparatus and method
CN116450777A (en) Electricity consumption appeal resolution method and system based on NLP and hot spot word element analysis
CN113158695A (en) Semantic auditing method and system for multi-language mixed text
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
US11599569B2 (en) Information processing device, information processing system, and computer program product for converting a causal relationship into a generalized expression
CN111259661B (en) New emotion word extraction method based on commodity comments
Selamat Improved N-grams approach for web page language identification
Surahio et al. Prediction system for sindhi parts of speech tags by using support vector machine
CN111062210A (en) Neural network-based predicate center word identification method
Jayasuriya et al. Learning a stochastic part of speech tagger for sinhala
Ananth et al. Grammatical tagging for the Kannada text documents using hybrid bidirectional long-short term memory model
CN115455986A (en) Spanish language place name translation method, device, equipment and medium
Fung et al. Mixed language query disambiguation
Saneifar et al. From terminology extraction to terminology validation: an approach adapted to log files
CN103902524A (en) Uygur language sentence boundary recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination