CN116796730A

CN116796730A - Text error correction method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN116796730A
Application number: CN202310658097.4A
Authority: CN
Inventors: 孟繁烨
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-09-22

Abstract

The embodiment of the application belongs to the field of artificial intelligence and the field of financial science and technology, and relates to a text error correction method based on artificial intelligence, which comprises the following steps: sentence data is obtained by dividing open source corpus data and is stored in a non-relational database; taking correct words in the confusion word dictionary as key words; acquiring sentence corpus containing keywords from a non-relational database; constructing a target sentence corpus based on the sentence corpus; constructing training data based on the confusion word dictionary and the target sentence corpus; training the initial text error correction model based on training data to obtain a text error correction model; and performing error correction processing on the text data to be subjected to error correction based on the text error correction model to generate error correction results. The application also provides a text error correction device, computer equipment and a storage medium based on the artificial intelligence. Furthermore, the text error correction model of the present application may be stored in a blockchain. The text error correction method and the text error correction device can be applied to text error correction scenes in the financial field, and can quickly and accurately correct the text data to be error corrected.

Description

Text error correction method, device, equipment and storage medium based on artificial intelligence

Technical Field

The application relates to the technical field of artificial intelligence and the technical field of finance, in particular to a text error correction method, a text error correction device, computer equipment and a storage medium based on artificial intelligence.

Background

Chinese text error correction is an important basic capability in the NLP field, and is used for detecting and correcting Chinese text spelling errors. The Chinese text error correction plays a very important role in a plurality of application scenes, such as search scenes of insurance companies, banks and the like, for example, the search error correction can automatically correct a user search query and input more in accordance with the user requirements; the intelligent question-answering user input error correction can automatically correct the error of wrongly written words in the user input, thereby being beneficial to more accurately understanding the user intention and better serving the user.

The existing text error correction method uses a rule engine to correct errors for specific scenes, but the error correction is performed only by the rule engine, so that the coverage rate of text error correction is low, the text error conditions beyond the rules cannot be processed, and meanwhile, the rule engine can easily cause misjudgment, so that the accuracy of text error correction is low.

Disclosure of Invention

The embodiment of the application aims to provide a text error correction method, a device, computer equipment and a storage medium based on artificial intelligence, which are used for solving the technical problems that the existing text error correction method uses a rule engine to correct errors for specific scenes, but the error correction is only performed by the rule engine, so that the coverage rate of text error correction is low, the text error conditions beyond rules cannot be processed, and meanwhile, the rule engine can easily cause misjudgment, so that the accuracy of text error correction is low.

In order to solve the technical problems, the embodiment of the application provides a text error correction method based on artificial intelligence, which adopts the following technical scheme:

performing sentence segmentation processing on pre-acquired open-source corpus data to obtain corresponding sentence data, and storing the sentence data into a preset non-relational database;

acquiring a pre-constructed confusion word dictionary, and taking correct words in the confusion word dictionary as key words;

searching the non-relational database based on the keywords, and acquiring sentence corpus containing the keywords from the non-relational database;

constructing a target sentence corpus based on the sentence corpus;

constructing training data based on the confusion word dictionary and the target sentence corpus;

training a preset initial text error correction model based on the training data to obtain a trained text error correction model; the initial text error correction model is obtained by constructing a detection network and a correction network;

and carrying out error correction processing on the text data to be subjected to error correction based on the text error correction model, and generating error correction results corresponding to the text data to be subjected to error correction.

Further, the step of constructing the target sentence corpus based on the sentence corpus specifically includes:

Acquiring a first number of sentence corpora;

judging whether the first quantity is smaller than a target quantity or not;

if yes, calculating a difference value between the target quantity and the first quantity;

searching in a preset search engine by using the keywords to obtain corresponding search results;

screening a first search sentence containing the keyword from the search result;

acquiring a plurality of second search sentences which are the same as the difference value from the first search sentences;

and constructing the target sentence corpus based on the second search sentence and the sentence corpus.

Further, the step of constructing training data based on the confusion word dictionary and the target sentence corpus specifically includes:

performing word replacement processing on target sentences contained in the target sentence corpus based on the confusion word dictionary to obtain replacement sentences corresponding to the target sentences;

the training data is constructed based on the target sentence and the replacement sentence.

Further, after the step of training the preset initial text correction model based on the training data to obtain a trained text correction model, the method further includes:

Acquiring a data storage type corresponding to the text error correction model;

determining a first storage block corresponding to the data storage type from a preset block chain; wherein the number of the first memory blocks includes a plurality of;

determining a target storage block from the first storage block;

and storing the text error correction model into a target storage block.

Further, the step of determining the target storage block from the first storage block specifically includes:

obtaining the residual storage space of each first storage block;

acquiring the occupied storage space of the text error correction model, and generating a storage space threshold value based on the occupied storage space;

screening second storage blocks with residual storage space larger than the storage space threshold from all the first storage blocks;

acquiring the data storage success rate of each second storage block in a preset time period;

screening a third storage block with the data storage success rate larger than a preset success rate threshold from all the first storage blocks;

generating a storage score of each third storage block based on the use times, the use evaluation values and the data storage success rate of each third storage block in the preset time period;

And screening a fourth storage block with the largest storage score from all the third storage blocks, and taking the fourth storage block as the target storage block.

Further, the step of generating the storage score of each third storage block based on the usage frequency, the usage evaluation value and the data storage success rate of each third storage block in the preset time period specifically includes:

acquiring the appointed use times, the appointed use evaluation value and the appointed data storage success rate of the appointed storage block in the preset time period; wherein the designated storage block is any one of all the third storage blocks;

acquiring a first preset weight, a second preset weight and a third preset weight which respectively correspond to the appointed use times, the appointed use evaluation value and the appointed data storage success rate;

and calling a preset calculation formula to calculate the appointed use times, the appointed use evaluation value and the appointed data storage success rate based on the first preset weight, the second preset weight and the third preset weight to obtain an appointed storage score of the appointed storage block.

Further, before the step of obtaining the pre-constructed confusion word dictionary, the method further includes:

acquiring word data of a target field;

acquiring manually input labeling information corresponding to the word data;

labeling the word data based on the labeling information to obtain processed target word data;

and storing the target word data into a preset dictionary to obtain the confusion word dictionary.

In order to solve the technical problems, the embodiment of the application also provides a text error correction device based on artificial intelligence, which adopts the following technical scheme:

the first processing module is used for carrying out sentence segmentation processing on the pre-acquired open-source corpus data to obtain corresponding sentence data, and storing the sentence data into a preset non-relational database;

the first acquisition module is used for acquiring a pre-constructed mixed word dictionary and taking correct words in the mixed word dictionary as key words;

the retrieval module is used for carrying out retrieval processing on the non-relational database based on the keywords, and acquiring sentence corpus containing the keywords from the non-relational database;

the first construction module is used for constructing a target sentence corpus based on the sentence corpus;

The second construction module is used for constructing training data based on the confusion word dictionary and the target sentence corpus;

the training module is used for training a preset initial text error correction model based on the training data to obtain a trained text error correction model; the initial text error correction model is obtained by constructing a detection network and a correction network;

and the error correction module is used for carrying out error correction processing on the text data to be corrected based on the text error correction model and generating error correction results corresponding to the text data to be corrected.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

constructing a target sentence corpus based on the sentence corpus;

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

constructing a target sentence corpus based on the sentence corpus;

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

firstly, carrying out sentence segmentation processing on pre-acquired open-source corpus data to obtain corresponding sentence data, and storing the sentence data into a preset non-relational database; then obtaining a pre-built confusion word dictionary, taking correct words in the confusion word dictionary as key words, carrying out search processing on the non-relational database based on the key words, and obtaining sentence corpus containing the key words from the non-relational database; then constructing a target sentence corpus based on the sentence corpus; subsequently constructing training data based on the confusion word dictionary and the target sentence corpus; training a preset initial text error correction model based on the training data to obtain a trained text error correction model; and finally, carrying out error correction processing on the text data to be corrected based on the text error correction model, and generating an error correction result corresponding to the text data to be corrected. The embodiment of the application can quickly construct a large amount of training corpus data required by training a text error correction model based on the use of open source corpus data, a non-relational database and a mixed word dictionary, and can quickly and accurately carry out error correction processing on the text data to be corrected based on the trained text error correction model so as to generate error correction results corresponding to the text data to be corrected, thereby realizing the identification of text error conditions within and outside rules and improving the accuracy of text error correction.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an artificial intelligence based text error correction method in accordance with the present application;

FIG. 3 is a schematic diagram of one embodiment of an artificial intelligence based text error correction apparatus in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the text error correction method based on artificial intelligence provided by the embodiment of the application is generally executed by a server/terminal device, and correspondingly, the text error correction device based on artificial intelligence is generally arranged in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of an artificial intelligence based text error correction method in accordance with the present application is shown. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs. The text error correction method based on artificial intelligence provided by the embodiment of the application can be applied to any scene needing text error correction, and can be applied to products of the scenes, such as financial text error correction in the field of financial insurance. The text error correction method based on artificial intelligence comprises the following steps:

Step S201, sentence segmentation processing is performed on pre-acquired open-source corpus data to obtain corresponding sentence data, and the sentence data are stored in a preset non-relational database.

In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the text error correction method based on artificial intelligence operates may obtain the open source corpus data through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. The open source data set may specifically refer to a data set corresponding to open source data on the internet, for example, may refer to data sets contained in hundred degrees encyclopedia, chinese encyclopedia and wikipedia. The split sentence data can be obtained by performing sentence-dividing processing on paragraphs in the open source data set according to sentence levels. The non-relational database may be ES (Elastic Search) database.

Step S202, a pre-built mixed word dictionary is obtained, and correct words in the mixed word dictionary are used as keywords.

In this embodiment, the confusion word dictionary includes word data in the target domain, and each word data is labeled with corresponding information, where the information label may include correct words and failed words. The construction of the confusion word dictionary can be derived from daily business accumulation, text corpus in the vertical field can be collected, and manual labeling is performed to construct the confusion word dictionary.

Step S203, performing a search process on the non-relational database based on the keywords, and obtaining sentence corpus containing the keywords from the non-relational database.

In this embodiment, if the number of sentence corpuses is equal to the target number, the sentence corpuses may be directly used as the target sentence corpuses. If the number of the sentence linguistic data is larger than the target number, the appointed sentence linguistic data corresponding to the target number can be randomly screened out from the sentence linguistic data and used as the target sentence linguistic data.

Step S204, constructing a target sentence corpus based on the sentence corpus.

In this embodiment, the above specific implementation process of constructing the target sentence corpus based on the sentence corpus, which will be described in further detail in the following specific embodiments, will not be described herein. The existing Chinese text error correction technology has the following problems: lacking high-quality corpus, the current common practice is to randomly replace words in correct sentences with wrong words to construct training corpus without considering the authenticity of the sentences. The target sentence corpus obtained in this embodiment is open source corpus data. For example, the Chinese wikipedia, the Baikipedia query or the search engine is used for searching and obtaining the corpus which is truly used, so that the quality is higher compared with the corpus obtained by other methods such as synonym replacement, foreign language translation and the like.

Step S205, constructing training data based on the confusion word dictionary and the target sentence corpus.

In this embodiment, the specific implementation process of constructing training data based on the confusion word dictionary and the target sentence corpus is described in further detail in the following specific embodiments, which will not be described herein.

Step S206, training a preset initial text error correction model based on the training data to obtain a trained text error correction model; the initial text error correction model is obtained by constructing a detection network and a correction network.

In this embodiment, the text error correction model may be a Soft-Masked BERT pre-training model. The text error correction model is formed by combining two networks in series, namely a detection network and a correction network, wherein the detection network uses a Bi-GRU network, the detection network is used for predicting the probability that characters are wrongly written characters, the correction network uses a Bert model, and the correction network is used for predicting the probability of error correction. Specifically, the detection network is used for fully learning the input context information, outputting the probability pi that each position i is likely to be a wrongly written word, the larger the probability value is, the greater the probability that the position is wrong, a Soft Masking part is further included between the detection network and the correction network, the part is used for multiplying the characteristic of each position by the characteristic of Masking characters by the probability of pi, the probability of 1-pi is multiplied by the original input word vector characteristic, and the last two parts are added to be the characteristic of each character; and subsequently inputting the characteristics of each character into a correction network, wherein the correction network is a BERT-based serial multi-classification marking model, the final characteristic representation of each character obtained through the correction network is the representation of the residual connection of the output of the last layer and the word vector characteristics, and the loss function is obtained by weighting the detection network and the correction network. In addition, the training data obtained by construction comprises a target sentence and a replacement sentence, the replacement sentence can be used as the input of the initial text error correction model, the target sentence is used as the output of the text error correction model, and the initial text error correction model is trained until the model converges, so that a trained text error correction model is obtained.

And step S207, performing error correction processing on the text data to be subjected to error correction based on the text error correction model, and generating error correction results corresponding to the text data to be subjected to error correction.

In this embodiment, for example, if the text data to be corrected is "my country" is china ", the sentence is input into the text correction model, the ear detection network in the text correction model outputs the probability that each position may be a wrongly written word, where" country "is a wrongly written word, the probability of outputting is relatively large, the Soft-mask of the text correction model is that the Soft-mask portion combines the probability of each position being a mistake with the Masking character feature, and outputs the result to the correction network, and the correction network outputs the character that is considered to be correct with the largest probability from the confusion word dictionary, and specifically, the" by "output of the text to be corrected to" country "is" home ", so as to implement text correction.

Firstly, carrying out sentence segmentation processing on pre-acquired open-source corpus data to obtain corresponding sentence data, and storing the sentence data into a preset non-relational database; then obtaining a pre-built confusion word dictionary, taking correct words in the confusion word dictionary as key words, carrying out search processing on the non-relational database based on the key words, and obtaining sentence corpus containing the key words from the non-relational database; then constructing a target sentence corpus based on the sentence corpus; subsequently constructing training data based on the confusion word dictionary and the target sentence corpus; training a preset initial text error correction model based on the training data to obtain a trained text error correction model; and finally, carrying out error correction processing on the text data to be corrected based on the text error correction model, and generating an error correction result corresponding to the text data to be corrected. The text error correction method and the text error correction device can quickly construct a large amount of training corpus data required by training a text error correction model based on the use of open source corpus data, the non-relational database and the confusion word dictionary, and can quickly and accurately carry out error correction processing on the text data to be corrected based on the trained text error correction model so as to generate error correction results corresponding to the text data to be corrected, thereby realizing the identification of text error conditions inside and outside rules and improving the accuracy of text error correction.

In some alternative implementations, step S204 includes the steps of:

and acquiring a first number of the sentence corpus.

And judging whether the first quantity is smaller than a target quantity.

In this embodiment, the value of the target number is not specifically limited, and may be generated according to actual service usage requirements.

If yes, calculating the difference value between the target quantity and the first quantity.

And searching in a preset search engine by using the keywords to obtain corresponding search results.

In this embodiment, the selection of the search engine is not particularly limited, and an existing open source search engine may be used.

And screening out a first search statement containing the keyword from the search result.

And acquiring a plurality of second search sentences which are the same as the difference value from the first search sentences.

In this embodiment, if the number of the first search terms is greater than the difference value, a plurality of second search terms identical to the difference value may be randomly selected from the first search terms.

In this embodiment, the target sentence corpus may be generated by incorporating the second search sentence into the sentence corpus. The target sentence corpus obtained in this embodiment is open source corpus data. For example, the Chinese wikipedia, the Baikipedia query or the search engine is used for searching and obtaining the corpus which is truly used, so that the quality is higher compared with the corpus obtained by other methods such as synonym replacement, foreign language translation and the like.

The method comprises the steps of obtaining a first number of sentence corpus and judging whether the first number is smaller than a target number or not; if yes, calculating a difference value between the target quantity and the first quantity; then searching in a preset search engine by using the keywords to obtain a corresponding search result; then screening out a first search sentence containing the keyword from the search result; subsequently, a plurality of second search sentences which are the same as the difference value are obtained from the first search sentences; and finally, constructing the target sentence corpus based on the second search sentence and the sentence corpus. When the sentence corpus obtained from the non-relational database is detected to be smaller than the target number, the keyword is intelligently used for searching the search engine, and the corresponding second search sentence is screened out from the search result to carry out the supplement processing on the sentence corpus, so that the required target sentence corpus is generated, the generating intelligence and the generating accuracy of the target sentence corpus are improved, and the construction of training data can be quickly completed based on the generated target sentence corpus.

In some alternative implementations of the present embodiment, step S205 includes the steps of:

And carrying out word replacement processing on target sentences contained in the target sentence corpus based on the confusion word dictionary to obtain replacement sentences corresponding to the target sentences.

In this embodiment, the word replacement processing refers to replacing the keyword position of the target sentence included in the target sentence corpus from the wrong word corresponding to the correct word of the target sentence included in the target sentence corpus in the mixed word dictionary, that is, replacing the correct word of the target sentence included in the target sentence corpus with the corresponding wrong word, so as to obtain the processed replaced sentence.

In this embodiment, the target sentence before the replacement may be taken as the target, the replacement sentence after the replacement may be taken as the source, and the positions of the two target sentences and the replacement sentence that are different from each other are marked with 1, and the same position is marked with 0, thereby obtaining the training data. Where source represents the string to be searched and target represents the string to be replaced. The commonly used Chinese text error correction technology of the target is often unidirectional correction, for example, when model training is carried out, I'm is provided with auxiliary eyes as model input, I'm is provided with auxiliary glasses as model output, the model hopes to correct the eyes into the glasses, and the eyes are not learned to be used as the context of the correct word, so that the error correction is easily caused. The text error correction model in the implementation is bidirectional error correction, not only considers the context of correcting the wrong word into the correct word, but also considers the context of correcting the wrong word as the correct word, and the text error correction model trained by the embodiment has stronger generalization capability.

According to the application, word replacement processing is carried out on target sentences contained in the target sentence corpus based on the mixed word dictionary, so that replacement sentences corresponding to the target sentences are obtained; and the training data is built based on the target sentences and the replacement sentences, so that the enhancement processing of target sentence corpus is realized, the amplification processing of target sentence corpus is realized, a large amount of high-quality Chinese text correction corpus is obtained rapidly, a foundation is laid for the training of a subsequent text correction model, and meanwhile, the problem of overfitting of the text correction model is solved.

In some alternative implementations, after step S206, the electronic device may further perform the following steps:

and acquiring a data storage type corresponding to the text error correction model.

In this embodiment, the data storage type of the above data storage type may refer to a model data storage type.

Determining a first storage block corresponding to the data storage type from a preset block chain; wherein the number of the first memory blocks includes a plurality of.

In this embodiment, the storage block information corresponding to the data storage type may be queried from a preset data storage mapping table, and further, the first storage block corresponding to the data storage type may be determined from the blockchain based on the storage block information. The data storage mapping table is a storage block in a blockchain which is constructed in advance and is used for storing data corresponding to various data storage types and is respectively corresponding to the various data storage types. The method comprises the steps of classifying a plurality of storage blocks included in a block chain according to dimensions of various data storage types according to actual service use requirements in advance to generate a corresponding relation between the data storage types and the storage blocks. Based on the one-to-one correspondence, the storage blocks are used for storing information matched with the corresponding data storage department types. In addition, the same memory block can simultaneously correspond to a plurality of different data memory types.

And determining a target storage block from the first storage block.

In this embodiment, the specific implementation process of determining the target memory block from the first memory block will be described in further detail in the following specific embodiments, which will not be described herein.

And storing the text error correction model into a target storage block.

The application obtains the data storage type corresponding to the text error correction model; then determining a first storage block corresponding to the data storage type from a preset block chain; and determining a target storage block from the first storage block, and storing the text error correction model into the target storage block. According to the method and the device, the text error correction model is intelligently stored into the target storage block corresponding to the data storage type based on the obtained data storage type corresponding to the text error correction model, so that the storage normalization and the intelligence of the first metadata information are improved.

In some optional implementations, the determining the target memory block from the first memory block includes the following steps:

and obtaining the residual storage space of each first storage block.

And acquiring the occupied storage space of the text error correction model, and generating a storage space threshold value based on the occupied storage space.

In this embodiment, a preset storage space elasticity value is obtained, a sum of the occupied storage space and the storage space elasticity value is calculated, and the sum is used as the storage space threshold. The value of the storage space elasticity value is not particularly limited, and can be generated according to actual data test, and the storage space elasticity value indicates that the data storage in the storage block cannot influence the normal operation of the storage block when the residual storage space of the storage block is larger than the storage space elasticity value; if the remaining memory space of the memory block is smaller than the memory space elasticity value, it indicates that the data storage in the memory block will affect the normal operation of the memory block, thereby affecting the memory efficiency and the memory success rate of the data storage of the memory block.

And screening second storage blocks with residual storage space larger than the storage space threshold value from all the first storage blocks.

And acquiring the data storage success rate of each second storage block in a preset time period.

In this embodiment, the preset time period is not specifically limited, and may be set according to actual service requirements, for example, may be the week before the current time.

And screening a third storage block with the data storage success rate larger than a preset success rate threshold from all the first storage blocks.

In this embodiment, the value of the success rate threshold is not specifically limited, and may be set according to actual requirements.

And generating a storage score of each third storage block based on the use times, the use evaluation values and the data storage success rate of each third storage block in the preset time period.

In this embodiment, the specific implementation process of generating the storage score of each third storage block based on the number of times of use, the use evaluation value and the data storage success rate of each third storage block in the preset time period will be described in further detail in the following specific embodiments, which will not be described herein.

In this embodiment, since the remaining storage space of the obtained target storage block is sufficient to store the text error correction model, the method is a block commonly used by users for storing data, the use evaluation degree of the users is higher, and the data storage is rarely failed, so that the use satisfaction degree of the users can be ensured, and the intelligence and stability of the text error correction model storage are improved.

According to the application, the condition screening is sequentially carried out on each first storage block by acquiring the residual storage space, the data storage success rate, the use times and the use evaluation value of each first storage block in the block chain, and the target storage block is finally determined, so that the storage of a generated text error correction model by reasonably selecting the corresponding storage block according to the storage condition and the use condition of the storage block is realized, the accuracy of the obtained target storage block is effectively ensured, the storage efficiency and the storage intelligence of the target storage block are improved, the quick inquiry by a user is facilitated, and the use experience of the user is improved.

In some optional implementations of this embodiment, the generating the storage score of each third storage block based on the number of times each third storage block is used in the preset time period, the usage evaluation value, and the data storage success rate includes the following steps:

Acquiring the appointed use times, the appointed use evaluation value and the appointed data storage success rate of the appointed storage block in the preset time period; the designated storage block is any one storage block of all the third storage blocks.

And acquiring a first preset weight, a second preset weight and a third preset weight which respectively correspond to the appointed use times, the appointed use evaluation value and the appointed data storage success rate.

In this embodiment, the values of the first preset weight, the second preset weight, and the third preset weight are not particularly limited, and may be set according to actual requirements or obtained by simulation according to a large amount of data

In this embodiment, the above-mentioned preset calculation formula may refer to a weighted summation formula.

According to the method and the device, the obtained using times, using evaluation values and data storage success rates of the third storage blocks in the preset time period are calculated by using the preset calculation formula, so that the storage score of each third storage block is quickly and accurately generated, the target storage block can be accurately and quickly determined from all the third storage blocks based on the storage score, the generated text error correction model can be stored in the target storage block, the storage efficiency and the storage intelligence of the text error correction model can be improved, a user can conveniently and quickly inquire the text error correction model, and the use experience of the user is improved.

In some optional implementations of this embodiment, before step S202, the electronic device may further perform the following steps:

word data of the target field is obtained.

In this embodiment, the target area is not limited, and may be set according to actual business requirements, for example, a financial area, an insurance area, a banking area, and the like. In a particular vertical domain, there are specialized words that are specific to that domain, which tend to be easily identified or misspelled. For example, the guard euphoria is an insurance product, and often is wrongly identified as a guard euphoria in a general speech recognition or pinyin input method, and the guard euphoria are correct words under different contexts,

and acquiring manually input labeling information corresponding to the word data.

In this embodiment, when constructing the confusion word dictionary, the relevant user may input labeling information corresponding to the word data according to an actual service usage request. With reference to the above example, both the daemon and the daemon can be used as correct words, the daemon as wrong words of the daemon, and the nationality as the wrong words of the nationality can be used as the wrong words of the nationality because the nationality is caused by complete misspelling and is wrong in any context for the combination of the nationality and the nationality.

And marking the word data based on the marking information to obtain processed target word data.

In this embodiment, the word data is labeled based on the labeling information, so that the word data and the labeling information have an association relationship.

In this embodiment, the dictionary may be a dictionary specially constructed in advance for storing the confusion words.

The method comprises the steps of obtaining word data in the target field; then, marking information which is manually input and corresponds to the word data is obtained; labeling the word data based on the labeling information to obtain processed target word data; and storing the target word data into a preset dictionary to obtain the confusion word dictionary. According to the method, the word data in the target field is labeled in a manual labeling mode, so that a needed confusion word dictionary can be generated quickly, and the needed training data can be built quickly and accurately based on the use of the confusion word dictionary.

It is emphasized that to further guarantee the privacy and security of the text error correction model, the text error correction model may also be stored in a blockchain node.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an artificial intelligence based text error correction apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 3, the text error correction apparatus 300 based on artificial intelligence according to the present embodiment includes: a first processing module 301, a first acquisition module 302, a retrieval module 303, a first construction module, a second construction module 305, a training module 306, and an error correction module 307. Wherein:

the first processing module 301 is configured to perform sentence segmentation processing on pre-acquired open-source corpus data to obtain corresponding sentence data, and store the sentence data into a preset non-relational database;

a first obtaining module 302, configured to obtain a pre-constructed mixed word dictionary, and use correct words in the mixed word dictionary as keywords;

a retrieval module 303, configured to perform a retrieval process on the non-relational database based on the keyword, and obtain a sentence corpus containing the keyword from the non-relational database;

a first construction module 304, configured to construct a target sentence corpus based on the sentence corpus;

A second construction module 305, configured to construct training data based on the confusion word dictionary and the target sentence corpus;

the training module 306 is configured to train a preset initial text correction model based on the training data, so as to obtain a trained text correction model; the initial text error correction model is obtained by constructing a detection network and a correction network;

and the error correction module 307 is configured to perform error correction processing on the text data to be corrected based on the text error correction model, and generate an error correction result corresponding to the text data to be corrected.

In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the text error correction method based on artificial intelligence in the foregoing embodiment one by one, which is not described herein again.

In some alternative implementations of the present embodiment, the first building block 304 includes:

the first acquisition submodule is used for acquiring a first number of the sentence corpus;

a judging sub-module, configured to judge whether the first number is smaller than a target number;

a calculating sub-module, configured to calculate, if yes, a difference between the target number and the first number;

the retrieval sub-module is used for retrieving in a preset search engine by using the keywords to obtain a corresponding retrieval result;

The first screening submodule is used for screening a first search statement containing the keyword from the search result;

a second obtaining sub-module, configured to obtain a plurality of second search sentences that are the same as the difference value from the first search sentences;

the first construction submodule is used for constructing the target sentence corpus based on the second search sentence and the sentence corpus.

In some alternative implementations of the present embodiment, the second building block 305 includes:

a replacement sub-module, configured to perform word replacement processing on a target sentence included in the target sentence corpus based on the mixed word dictionary, to obtain a replacement sentence corresponding to the target sentence;

and the second construction submodule is used for constructing the training data based on the target sentence and the replacement sentence.

In some optional implementations of this embodiment, the artificial intelligence based text error correction apparatus further includes:

the second acquisition module is used for acquiring a data storage type corresponding to the text error correction model;

the first determining module is used for determining a first storage block corresponding to the data storage type from a preset block chain; wherein the number of the first memory blocks includes a plurality of;

the second determining module is used for determining a target storage block from the first storage block;

and the first storage module is used for storing the text error correction model into a target storage block.

In some optional implementations of this embodiment, the second determining module includes:

the third acquisition submodule is used for acquiring the residual storage space of each first storage block;

a fourth obtaining sub-module, configured to obtain an occupied storage space of the text error correction model, and generate a storage space threshold based on the occupied storage space;

The second screening submodule is used for screening second storage blocks with residual storage space larger than the storage space threshold from all the first storage blocks;

a fifth obtaining sub-module, configured to obtain a success rate of data storage of each second storage block in a preset time period;

the third screening submodule is used for screening a third storage block with the data storage success rate larger than a preset success rate threshold from all the first storage blocks;

the generation submodule is used for generating a storage score of each third storage block based on the use times, the use evaluation values and the data storage success rate of each third storage block in the preset time period;

and the fourth screening submodule is used for screening a fourth storage block with the largest storage score from all the third storage blocks, and taking the fourth storage block as the target storage block.

In some optional implementations of the present embodiment, generating the sub-module includes:

A first obtaining unit, configured to obtain a specified use number, a specified use evaluation value, and a specified data storage success rate of a specified storage block in the preset time period; wherein the designated storage block is any one of all the third storage blocks;

a second obtaining unit, configured to obtain a first preset weight, a second preset weight, and a third preset weight that respectively correspond to the specified use times, the specified use evaluation value, and the specified data storage success rate;

and the calculation acquisition unit is used for calling a preset calculation formula to calculate the appointed use times, the appointed use evaluation value and the appointed data storage success rate based on the first preset weight, the second preset weight and the third preset weight so as to obtain the appointed storage score of the appointed storage block.

The third acquisition module is used for acquiring word data in the target field;

a fourth obtaining module, configured to obtain manually input labeling information corresponding to the word data;

the second processing module is used for marking the word data based on the marking information to obtain processed target word data;

and the second storage module is used for storing the target word data into a preset dictionary to obtain the confusion word dictionary.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of an artificial intelligence based text error correction method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as executing computer readable instructions of the artificial intelligence based text error correction method.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

in the embodiment of the application, firstly, sentence segmentation processing is carried out on pre-acquired open-source corpus data to obtain corresponding sentence data, and the sentence data is stored in a preset non-relational database; then obtaining a pre-constructed confusion word dictionary, taking correct words in the confusion word dictionary as key words, and carrying out retrieval processing on a non-relational database based on the key words, and obtaining sentence corpus containing the key words from the non-relational database; then constructing a target sentence corpus based on the sentence corpus; the training data is built based on the confusion word dictionary and the target sentence corpus; training a preset initial text error correction model based on training data to obtain a trained text error correction model; and finally, carrying out error correction processing on the text data to be corrected based on the text error correction model, and generating error correction results corresponding to the text data to be corrected. The embodiment of the application can quickly construct a large amount of training corpus data required by training a text error correction model based on the use of open source corpus data, a non-relational database and a confusion word dictionary, and can quickly and accurately carry out error correction processing on the text data to be corrected based on the trained text error correction model to generate an error correction result, thereby realizing the identification of text error conditions inside and outside rules and improving the accuracy of text error correction.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of an artificial intelligence-based text error correction method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. The text error correction method based on artificial intelligence is characterized by comprising the following steps:

constructing a target sentence corpus based on the sentence corpus;

2. The text correction method based on artificial intelligence according to claim 1, wherein the step of constructing a target sentence corpus based on the sentence corpus specifically comprises:

Acquiring a first number of sentence corpora;

judging whether the first quantity is smaller than a target quantity or not;

3. The artificial intelligence based text correction method of claim 1, wherein the step of constructing training data based on the confusion word dictionary and the target sentence corpus specifically comprises:

4. The artificial intelligence based text correction method according to claim 1, further comprising, after the step of training a preset initial text correction model based on the training data to obtain a trained text correction model:

Acquiring a data storage type corresponding to the text error correction model;

determining a target storage block from the first storage block;

and storing the text error correction model into a target storage block.

5. The artificial intelligence based text error correction method according to claim 4, wherein the step of determining a target memory block from the first memory block comprises:

obtaining the residual storage space of each first storage block;

6. The text error correction method based on artificial intelligence according to claim 5, wherein the step of generating the storage score of each third storage block based on the number of times each third storage block is used in the preset time period, the use evaluation value, and the data storage success rate specifically comprises:

7. The artificial intelligence based text correction method of claim 1, further comprising, prior to the step of obtaining a pre-constructed confusion word dictionary:

acquiring word data of a target field;

acquiring manually input labeling information corresponding to the word data;

8. An artificial intelligence based text error correction apparatus, comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based text error correction method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based text error correction method of any of claims 1 to 7.