CN110704623A - Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework - Google Patents

Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework Download PDF

Info

Publication number
CN110704623A
CN110704623A CN201910923027.0A CN201910923027A CN110704623A CN 110704623 A CN110704623 A CN 110704623A CN 201910923027 A CN201910923027 A CN 201910923027A CN 110704623 A CN110704623 A CN 110704623A
Authority
CN
China
Prior art keywords
rasa
intention
entity identification
nlu
rate based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910923027.0A
Other languages
Chinese (zh)
Inventor
冯海洪
毛德平
王康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Mic Technology Co Ltd
Original Assignee
Anhui Mic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Mic Technology Co Ltd filed Critical Anhui Mic Technology Co Ltd
Priority to CN201910923027.0A priority Critical patent/CN110704623A/en
Publication of CN110704623A publication Critical patent/CN110704623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of data processing, in particular to a method, a device, a system and a storage medium for improving entity identification rate based on a Rasa _ Nlu framework, wherein the method comprises the following steps: firstly, inputting voice and using jieba word segmentation; then, obtaining and preprocessing the corpus; next, carrying out MITIE model training, and carrying out model training by adopting a tool, namely wordrep in MITIE to obtain a data set; and finally, establishing a Rasa _ Nlu corpus and a model for intention identification and entity identification to obtain the intention of the user, wherein the method can accurately analyze the intention of the user by using the latest natural language processing technology in the field of artificial intelligence, and the method based on the Rasa _ Nlu framework in the computer scene can improve the entity identification rate, solve the problem of low entity identification rate in the existing method and provide great convenience for people.

Description

Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework
Technical Field
The invention relates to the field of data processing, in particular to a method, a device, a system and a storage medium for improving entity identification rate based on a Rasa _ Nlu framework.
Background
The natural Language processing (Nature Language Process) is divided into three links, wherein most of the difficult points appear in the natural Language understanding (Nature Language understanding), and the main problems are ambiguity and unknown Language phenomena. On the one hand, the ambiguity phenomenon existing in natural language in large quantity is a fundamental problem which puzzles people to realize the application target no matter in lexical level, syntactic level, semantic level and pragmatic level, and no matter in which kind of language unit. On the other hand, for a specific system, there is always the possibility of various unexpected situations such as unknown vocabulary, unknown structure, etc., and every language is dynamically changed with the development of society, new vocabulary (especially some new names of people, place, organization, and special vocabulary), new word senses, new vocabulary usage (new word classes), and even new sentence structure are continuously appearing, especially in spoken language conversations or computer network conversations (microblogs, blogs, etc., rare and odd word and speech structures are more common.
At present, entity recognition rates of many natural language understanding methods in the market are particularly low, so a method for improving the entity recognition rate based on a RasaNlu framework in a computer scene is developed.
Disclosure of Invention
In view of the above existing problems, an object of the present invention is to provide a method for improving an entity identification rate based on a RasaNlu framework in a computer scenario, to solve the problem of low entity identification rate in the existing method, and to solve the above existing problems in the prior art, the present invention provides a method for improving an entity identification rate based on a Rasa _ Nlu framework, including the following steps:
step S1: inputting voice and using jieba word segmentation;
step S2: obtaining and preprocessing a corpus;
step S3: carrying out MITIE model training, namely carrying out model training by adopting a tool, namely wordrep in MITIE to obtain a data set;
step S4: constructing a Rasa _ Nlu corpus and a model for intention identification and entity identification;
step S5: the intention of the user is acquired.
Preferably, the intention recognition in step S4 is to classify at sentence level to clarify the intention; the entity identification is to find out the key entities in the user question at the word level and fill entity slots.
In order to achieve the above object, the present invention further provides an apparatus for improving entity recognition rate based on Rasa _ Nlu framework, comprising
The information input module is used for inputting voice;
the information acquisition and preprocessing module is used for acquiring voice information and preprocessing the voice information;
the MITIE model training module is used for training a model to obtain a data set;
constructing a Rasa _ Nlu corpus and a model for intention identification and entity identification;
and the acquisition module is used for acquiring the intention of the user.
To achieve the above object, the present invention further provides a system for improving entity identification rate based on Rasa _ Nlu framework, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of the above method when executing the computer program.
To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the above method.
The invention has the beneficial effects that: the current latest natural language processing technology in the field of artificial intelligence is applied, the intention of a user can be accurately analyzed, and the method based on the Rasanlu framework in a computer scene can improve the entity recognition rate and provide great convenience for people.
Drawings
Fig. 1 is an overall flowchart of a method for improving an entity identification rate based on a Rasa _ Nlu framework in embodiment 1 of the present invention.
Fig. 2 is a block diagram illustrating an apparatus for improving an entity recognition rate according to a Rasa _ Nlu framework in embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Fig. 1 is a flowchart of the overall embodiment 1 of the method for improving the entity recognition rate based on the Rasa _ Nlu framework. As shown in fig. 1, a method for improving an entity identification rate based on a Rasa _ Nlu framework includes the following steps:
step S1: speech input and word segmentation using jieba.
Step S2: and obtaining and preprocessing the corpus.
Step S3: and (3) carrying out MITIE model training, namely carrying out model training by adopting a tool, namely wordrep in MITIE to obtain a data set.
Step S4: and constructing Rasa _ Nlu corpora and models for intention identification and entity identification.
In the step, the intention identification is to classify at sentence level to clarify the intention; the entity identification is to find out the key entities in the user question at the word level and fill entity slots.
Step S5: the intention of the user is acquired.
Example 2
Fig. 2 is a block diagram of an embodiment 2 of the apparatus for increasing an entity recognition rate based on a Rasa _ Nlu framework. As shown in fig. 2, the present embodiment provides an apparatus for improving an entity identification rate based on a Rasa _ Nlu framework, including
The information input module is used for inputting voice;
the information acquisition and preprocessing module is used for acquiring voice information and preprocessing the voice information;
the MITIE model training module is used for training a model to obtain a data set;
constructing a Rasa _ Nlu corpus and a model for intention identification and entity identification;
and the acquisition module is used for acquiring the intention of the user.
Example 3
The embodiment provides a system for improving an entity identification rate based on a Rasa _ Nlu framework, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method when executing the computer program.
Example 4
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the above-mentioned method.
In summary, the method, the apparatus, the system and the storage medium for improving the entity recognition rate based on the Rasa _ Nlu framework disclosed in the embodiments of the present invention can accurately analyze the intention of the user by using the current latest natural language processing technology in the field of artificial intelligence, and the method based on the rasan nlu framework in a computer scenario can improve the entity recognition rate, thereby providing great convenience for people.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the changes or modifications within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (5)

1. A method for improving entity identification rate based on a Rasa _ Nlu framework is characterized by comprising the following steps:
step S1: inputting voice and using jieba word segmentation;
step S2: obtaining and preprocessing a corpus;
step S3: carrying out MITIE model training, namely carrying out model training by adopting a tool, namely wordrep in MITIE to obtain a data set;
step S4: constructing a Rasa _ Nlu corpus and a model for intention identification and entity identification;
step S5: the intention of the user is acquired.
2. The method for improving entity recognition rate based on Rasa _ Nlu framework of claim 1, wherein: in step S4, the intention recognition is to classify the sentence level to clarify the intention; the entity identification is to find out the key entities in the user question at the word level and fill entity slots.
3. An apparatus for improving entity identification rate based on Rasa _ Nlu framework, characterized in that: comprises that
The information input module is used for inputting voice;
the information acquisition and preprocessing module is used for acquiring voice information and preprocessing the voice information;
the MITIE model training module is used for training a model to obtain a data set;
constructing a Rasa _ Nlu corpus and a model for intention identification and entity identification;
and the acquisition module is used for acquiring the intention of the user.
4. A system for improving entity recognition rate based on Rasa _ Nlu framework, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the computer program, realizes the steps of the method of any of the preceding claims 1 to 2.
5. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any of claims 1 to 2.
CN201910923027.0A 2019-09-27 2019-09-27 Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework Pending CN110704623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910923027.0A CN110704623A (en) 2019-09-27 2019-09-27 Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910923027.0A CN110704623A (en) 2019-09-27 2019-09-27 Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework

Publications (1)

Publication Number Publication Date
CN110704623A true CN110704623A (en) 2020-01-17

Family

ID=69198239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910923027.0A Pending CN110704623A (en) 2019-09-27 2019-09-27 Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework

Country Status (1)

Country Link
CN (1) CN110704623A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564916A (en) * 2022-03-03 2022-05-31 山东新一代信息产业技术研究院有限公司 Method, device and medium for simplifying corpus addition and corpus tagging

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427722A (en) * 2018-02-09 2018-08-21 卫盈联信息技术(深圳)有限公司 intelligent interactive method, electronic device and storage medium
CN109146610A (en) * 2018-07-16 2019-01-04 众安在线财产保险股份有限公司 It is a kind of intelligently to insure recommended method, device and intelligence insurance robot device
CN109522393A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Intelligent answer method, apparatus, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427722A (en) * 2018-02-09 2018-08-21 卫盈联信息技术(深圳)有限公司 intelligent interactive method, electronic device and storage medium
CN109146610A (en) * 2018-07-16 2019-01-04 众安在线财产保险股份有限公司 It is a kind of intelligently to insure recommended method, device and intelligence insurance robot device
CN109522393A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Intelligent answer method, apparatus, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王雅君: ""基于RASA的智能语音对话***"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564916A (en) * 2022-03-03 2022-05-31 山东新一代信息产业技术研究院有限公司 Method, device and medium for simplifying corpus addition and corpus tagging

Similar Documents

Publication Publication Date Title
TWI636452B (en) Method and system of voice recognition
CN111090736B (en) Question-answering model training method, question-answering method, device and computer storage medium
CN111708869B (en) Processing method and device for man-machine conversation
CN112507706B (en) Training method and device for knowledge pre-training model and electronic equipment
CN111160041B (en) Semantic understanding method and device, electronic equipment and storage medium
CN110019304B (en) Method for expanding question-answering knowledge base, storage medium and terminal
CN115309877A (en) Dialog generation method, dialog model training method and device
CN112560510A (en) Translation model training method, device, equipment and storage medium
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
US20230094730A1 (en) Model training method and method for human-machine interaction
CN110019305B (en) Knowledge base expansion method, storage medium and terminal
CN117271736A (en) Question-answer pair generation method and system, electronic equipment and storage medium
CN113569559B (en) Short text entity emotion analysis method, system, electronic equipment and storage medium
CN113553411B (en) Query statement generation method and device, electronic equipment and storage medium
CN112349294B (en) Voice processing method and device, computer readable medium and electronic equipment
CN109934347B (en) Device for expanding question-answer knowledge base
CN110704623A (en) Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework
CN109002498B (en) Man-machine conversation method, device, equipment and storage medium
CN111680146A (en) Method and device for determining new words, electronic equipment and readable storage medium
US20230317058A1 (en) Spoken language processing method and apparatus, and storage medium
CN108920560B (en) Generation method, training method, device, computer readable medium and electronic equipment
CN111046674A (en) Semantic understanding method and device, electronic equipment and storage medium
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN114490969B (en) Question and answer method and device based on table and electronic equipment
CN115620726A (en) Voice text generation method, and training method and device of voice text generation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117