CN111949765B

CN111949765B - Semantic-based similar text searching method, system, device and storage medium

Info

Publication number: CN111949765B
Application number: CN202010843746.4A
Authority: CN
Inventors: 卓民; 杨楠
Original assignee: Shenzhen Kaniu Technology Co ltd
Current assignee: Shenzhen Kaniu Technology Co ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2024-06-14
Anticipated expiration: 2040-08-20
Also published as: CN111949765A

Abstract

The embodiment of the invention discloses a semantic-based similar text searching method, a semantic-based similar text searching system, semantic-based similar text searching equipment and a semantic-based similar text searching storage medium. The method comprises the following steps: acquiring a target text; splitting the target text to obtain a plurality of first split texts; searching first semantic features of each first split text in a semantic feature table generated based on a preset database; acquiring target semantic features of the target text, wherein the target semantic features are average values of first semantic features of a plurality of first split texts; and obtaining similar texts similar to the target text from the preset database according to the target semantic features. The embodiment of the invention realizes that the accuracy of similar text search is improved by combining the semantics.

Description

Semantic-based similar text searching method, system, device and storage medium

Technical Field

The embodiment of the invention relates to text technology, in particular to a semantic-based similar text searching method, a semantic-based similar text searching system, semantic-based similar text searching equipment and a semantic-based similar text searching storage medium.

Background

With the development of internet technology and the coming of the informatization age, the way of people to acquire information is more and more varied, and the application scenes of searching similar texts are particularly wide.

In the existing general method for searching similar texts, keywords, synonyms and paraphraseology are used as keys, and an index library (Bag Of Words, BOW) Of word bags is built for the articles and then is queried. But such searches do not incorporate semantics or context and the search results are not accurate.

Disclosure of Invention

The embodiment of the invention provides a similar text searching method, a system, equipment and a storage medium based on semantics, which are used for realizing the improvement of the accuracy of similar text searching by combining the semantics.

To achieve the object, an embodiment of the present invention provides a semantic-based similar text search method, which includes:

acquiring a target text;

Splitting the target text to obtain a plurality of first split texts;

searching first semantic features of each first split text in a semantic feature table generated based on a preset database;

acquiring target semantic features of the target text, wherein the target semantic features are average values of first semantic features of a plurality of first split texts;

And obtaining similar texts similar to the target text from the preset database according to the target semantic features.

Further, the obtaining the target text includes:

Acquiring the precision requirement input by a user;

Determining a semantic radius according to the precision requirement;

The searching the first semantic feature of each first split text in the semantic feature table generated based on the preset database comprises the following steps:

and searching the first semantic features of each first split text in a semantic feature table generated based on a preset database based on the semantic radius.

Further, before the target text is obtained, the method includes:

acquiring training texts in a preset database;

Splitting the training text to obtain a plurality of second split texts;

Inputting each second split text into a preset neural network model to obtain second semantic features of each second split text, wherein the second semantic features are matrixes formed by occurrence probabilities of the rest second split texts of the second split text within a preset semantic radius;

acquiring training semantic features of the training text, wherein the training semantic features are average values of a plurality of second semantic features;

And generating a semantic feature table according to the second semantic features.

Further, the inputting each of the second split texts into a preset neural network model to obtain the second semantic feature of each of the second split texts includes:

converting the second split text into a third split text based on one-hot coding;

and inputting each third split text into a preset neural network model to obtain a second semantic feature of each second split text.

Further, the obtaining, according to the target semantic feature, similar text similar to the target text from the preset database includes:

and obtaining similar texts similar to the target text from the preset database according to the target semantic features and the training semantic features.

Further, the obtaining similar text similar to the target text from the preset database according to the target semantic feature and the training semantic feature includes:

Obtaining similar semantic features, wherein the similar semantic features are training semantic features with the difference value between the training semantic features and the target semantic features being smaller than a first threshold value;

and obtaining similar texts from the preset database according to the similar semantic features.

Further, the neural network model is a Skip-Gram model based on Word2 vec.

In one aspect, an embodiment of the present invention further provides a semantic-based similar text search system, where the system includes:

the text acquisition module is used for acquiring a target text;

the text splitting module is used for splitting the target text to obtain a plurality of first split texts;

the feature searching module is used for searching first semantic features of each first split text in a semantic feature table generated based on a preset database;

The feature acquisition module is used for acquiring target semantic features of the target text, wherein the target semantic features are average values of first semantic features of a plurality of first split texts;

and the text searching module is used for acquiring similar texts similar to the target text from the preset database according to the target semantic features.

In another aspect, an embodiment of the present invention further provides a computer device, including: one or more processors; and a storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method as provided by any of the embodiments of the present invention.

In yet another aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as provided by any of the embodiments of the present invention.

According to the embodiment of the invention, the target text is acquired; splitting the target text to obtain a plurality of first split texts; searching first semantic features of each first split text in a semantic feature table generated based on a preset database; acquiring target semantic features of the target text, wherein the target semantic features are average values of first semantic features of a plurality of first split texts; and obtaining a similar text similar to the target text from the preset database according to the target semantic features, so that the problem is solved and the effect is realized.

Drawings

FIG. 1 is a flow chart of a semantic-based similar text search method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for generating a semantic feature table according to a second embodiment of the present invention;

FIG. 3 is a flow chart of a semantic-based similar text search method according to a second embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a semantic-based similar text search system according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not of limitation. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Furthermore, the terms "first," "second," and the like, may be used herein to describe various directions, acts, steps, or elements, etc., but these directions, acts, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first module may be referred to as a second module, and similarly, a second module may be referred to as a first module, without departing from the scope of the application. Both the first module and the second module are modules, but they are not the same module. The terms "first," "second," and the like, are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the embodiments of the present application, the meaning of "plurality" is at least two, for example, two, three, etc., unless explicitly defined otherwise.

Example 1

As shown in fig. 1, a first embodiment of the present invention provides a semantic-based similar text searching method, which includes:

s110, acquiring a target text.

S120, splitting the target text to obtain a plurality of first split texts.

In this embodiment, a target text is obtained first, where the target text may be an electronic book, a web page news, a journal or a patent, and the like, specifically, a text of which a user needs to obtain a similar text, and after the target text is obtained, the target text needs to be split to obtain a plurality of first split texts, where the first split text may be a word or a single word, and preferably, a method of barking and word segmentation is used to split the target text into a plurality of first split texts.

S130, searching first semantic features of each first split text in a semantic feature table generated based on a preset database.

S140, acquiring target semantic features of the target text, wherein the target semantic features are average values of the first semantic features of the plurality of first split texts.

In this embodiment, after a plurality of first split texts are obtained, the first semantic feature of each first split text may be found in a semantic feature table generated based on a preset database, where the semantic feature table is generated by training using text data in the preset database, and each word corresponds to a unique semantic feature, so that the first semantic feature of each first split text may be found in the semantic feature table. Further, the first semantic features of each first split text are summed and averaged to obtain an average value of all the first semantic features, and the average value is used as a target semantic feature of the whole target text, so that the text feature based on the semantic of the target text, namely the target semantic feature, is obtained.

S150, obtaining similar texts similar to the target text from the preset database according to the target semantic features.

In this embodiment, after the target semantic feature of the target text is obtained, a similar text similar to the target text may be obtained from a preset database according to the target semantic feature, and the closer the semantic feature of the two texts is, the more similar the two texts are described.

Example two

As shown in fig. 2 and fig. 3, a second embodiment of the present invention provides a semantic-based similar text searching method, and the second embodiment of the present invention is further explained based on the first embodiment of the present invention.

In this embodiment, as shown in fig. 2, before executing the similar text searching method based on semantics, a semantic feature table generated based on a preset database needs to be obtained, which specifically includes:

S210, acquiring training texts in a preset database.

S220, splitting the training texts to obtain a plurality of second split texts.

S230, converting the second split text into a third split text based on one-hot coding.

S240, inputting each third split text into a preset neural network model to obtain second semantic features of each second split text, wherein the second semantic features are matrixes formed by occurrence probabilities of the rest second split texts of the second split text within a preset semantic radius.

S250, acquiring training semantic features of the training text, wherein the training semantic features are average values of a plurality of second semantic features.

S260, generating a semantic feature table according to the second semantic features.

In this embodiment, in order to obtain the semantic feature table, training of the neural network model is required to be performed by using a large number of training texts in a preset database. Firstly, training texts in a preset database are acquired, wherein the training texts are multiple, each training text is split to obtain multiple second split texts, in order to adapt to a neural network model, the second split texts are also required to be converted into third split texts based on one-hot (independent hot) codes, and then the third split texts can be input into the preset neural network model, wherein the neural network model is a Skip-Gram model based on Word2vec, so that a semantic radius is required to be defined by a user during input, and the Skip-Gram model can output a unique matrix formed by occurrence probability of the rest second split texts of the second split texts within the preset semantic radius, wherein the matrix is the second semantic feature of the second split texts. Finally, the second semantic features and the corresponding second split text can be used for generating a semantic feature table, and the second semantic features in one training text can be summed and averaged to obtain training semantic features of the training text, so that the training semantic features can be used for subsequent searching and comparison of users.

Preferably, multiple semantic radii can be input during training to perform multiple training to address subsequent search requirements of different precision.

For example, one of the training texts includes "Chengdu, jinshengcheng, abbreviated as Rong, located in the middle east of Sichuan province, china, west of Sichuan basin. The method comprises the steps of firstly splitting the second split texts into second split texts such as Chengdu, jinzhen, short, rong, located in China, sichuan province, middle east, at the ground, sichuan basin and west by using a Jiba word segmentation method, converting the second split texts into third split texts based on one-hot coding, and inputting each third split text into a preset neural network model to obtain second semantic features of each second split text.

Specifically, when the second split text is "chinese", the semantic radius is set to be 2, and then the lattice phrase of the second split text based on the rest of the second split texts within the semantic radius in this case is shown in table 1, the corresponding second semantic features are shown in table 2, and the lattice phrase and the corresponding second semantic features are in one-to-one correspondence, that is, 0.87 is the probability that the "paste" appears in the first two digits of "chinese", 0.68 is the probability that the "paste" appears in the first digit of "chinese", 0.94 is the probability that the "Sichuan province" appears in the second digit of "chinese", 0.78 is the probability that the "middle eastern" appears in the second digit of "chinese", and the second semantic features shown in table 2 can be used to represent the second split text of "chinese".

TABLE 1

Minced meat	Is positioned at
		Sichuan province	Middle eastern part

TABLE 2

0.87	0.68
		0.94	0.78

TABLE 3 Table 3

Jinzhengheng city	Short for short	Minced meat	Is positioned at
				Sichuan province	Middle eastern part	Sichuan province	Middle eastern part

TABLE 4 Table 4

0.68	0.75	0.87	0.68
				0.94	0.78	0.94	0.78

Further, setting the semantic radius to be 4, at this time, the "chinese" is based on the lattice phrase of the rest of the second split text in the semantic radius as shown in table 3, the corresponding second semantic feature is shown in table 4, wherein 0.68 is the probability that the "jinsheng" appears in the first four digits of "chinese", 0.75 is the probability that the "short" appears in the first three digits of "chinese", 0.87 is the probability that the "paste" appears in the first two digits of "chinese", 0.68 is the probability that the "in the first digit of" chinese ", 0.94 is the probability that the" Sichuan province "appears in the second digit of" chinese ", 0.78 is the probability that the" middle eastern "appears in the second two digits of" chinese ", 0.67 is the probability that the" at the second three digits of "chinese", 0.61 is the probability that the "Sichuan basin" appears in the second four digits of "chinese", and the second semantic feature as shown in table 4 can be more precisely used to represent the second split text of "chinese".

In this embodiment, as shown in fig. 3, the semantic-based similar text searching method includes:

S310, acquiring a target text.

S320, obtaining the precision requirement input by the user.

S330, determining the semantic radius according to the precision requirement.

S340, splitting the target text to obtain a plurality of first split texts.

S350, searching first semantic features of each first split text in a semantic feature table generated based on a preset database based on the semantic radius.

S360, acquiring target semantic features of the target text, wherein the target semantic features are average values of the first semantic features of the plurality of first split texts.

In this embodiment, the splitting of the target text is the same as the splitting of the training text, and the embodiments of the present invention are not described herein again. The larger the semantic radius of the target text is, the higher the searching precision is, the higher the representation capability of the corresponding semantic features is, for example, the lower semantic radius can be used for early coarse recall, the higher semantic radius can be used for some tasks with higher precision requirements, for example, fine sorting, but the semantic radius cannot exceed the semantic radius in training, after the user inputs the precision requirements, the corresponding semantic radius can be confirmed, if the user inputs various precision requirements, the multiple semantic radii can be also determined, and different semantic features are trained for different semantic radii in the semantic feature table, and finally, the multiple target semantic features based on the different semantic radii can be obtained.

S370, obtaining similar semantic features, wherein the similar semantic features are training semantic features with differences from the target semantic features smaller than a first threshold.

S380, obtaining similar texts from the preset database according to the similar semantic features.

In this embodiment, after obtaining the target semantic features of the target text, the target text may be subjected to a similar search in a preset database to obtain training semantic features of training texts in the preset database, and training semantic features with a difference value smaller than a first threshold value from the target semantic features are searched to be used as the similar semantic features, and then training texts similar to the target text may be obtained from the preset database according to the similar semantic features to be used as the similar texts, thereby completing the similarity search of the target text.

Preferably, after each target text completes the search, the target text and the corresponding target semantic features are stored in a preset database as training texts for subsequent similar searches.

Example III

As shown in fig. 4, the third embodiment of the present invention provides a similar text search system 100 based on semantics, where the similar text search system 100 based on semantics provided in the third embodiment of the present invention can execute the similar text search method based on semantics provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method. The semantic-based similar text search system 100 includes a text retrieval module 200, a text splitting module 300, a feature lookup module 400, a feature retrieval module 500, and a text search module 600.

Specifically, the text obtaining module 200 is configured to obtain a target text; the text splitting module 300 is configured to split the target text to obtain a plurality of first split texts; the feature searching module 400 is configured to find a first semantic feature of each of the first split texts in a semantic feature table generated based on a preset database; the feature acquisition module 500 is configured to acquire a target semantic feature of the target text, where the target semantic feature is an average value of first semantic features of a plurality of first split texts; the text search module 600 is configured to obtain, from the preset database, a similar text similar to the target text according to the target semantic feature.

In this embodiment, the semantic-based similar text search system 100 further includes an accuracy determination module 700 and a text training module 800. The neural network model is a Skip-Gram model based on Word2 vec.

Specifically, the accuracy determining module 700 is configured to obtain an accuracy requirement input by a user; and determining a semantic radius according to the precision requirement. The feature searching module 400 is specifically configured to find, based on the semantic radius, a first semantic feature of each of the first split texts in a semantic feature table generated based on a preset database. The text training module 800 is configured to obtain training text in a preset database; splitting the training text to obtain a plurality of second split texts; inputting each second split text into a preset neural network model to obtain second semantic features of each second split text, wherein the second semantic features are matrixes formed by occurrence probabilities of the rest second split texts of the second split text within a preset semantic radius; acquiring training semantic features of the training text, wherein the training semantic features are average values of a plurality of second semantic features; and generating a semantic feature table according to the second semantic features. The text training module 800 is specifically configured to convert the second split text into a third split text based on one-hot encoding; and inputting each third split text into a preset neural network model to obtain a second semantic feature of each second split text.

Further, the text search module 600 is specifically configured to obtain, from the preset database, a similar text similar to the target text according to the target semantic feature and the training semantic feature. The text search module 600 is specifically further configured to obtain similar semantic features, where the similar semantic features are training semantic features with a difference value from the target semantic features being smaller than a first threshold; and obtaining similar texts from the preset database according to the similar semantic features.

Example IV

Fig. 5 is a schematic structural diagram of a computer device 12 according to a fourth embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in FIG. 5, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the methods provided by embodiments of the present invention:

acquiring a target text;

Splitting the target text to obtain a plurality of first split texts;

Example five

The fifth embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the methods as provided by all the embodiments of the present application:

acquiring a target text;

Splitting the target text to obtain a plurality of first split texts;

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the invention, the scope of which is determined by the scope of the appended claims.

Claims

1. A semantic-based similar text search method, comprising:

acquiring training texts in a preset database;

Splitting the training text to obtain a plurality of second split texts;

generating a semantic feature table according to the second semantic features;

acquiring a target text; the target text is a text of which the user needs to obtain a similar text; the target text comprises an electronic book, webpage news, journals and patents;

Splitting the target text to obtain a plurality of first split texts;

searching first semantic features of each first split text in the semantic feature table generated based on the preset database;

2. The method of claim 1, wherein the obtaining the target text comprises:

Acquiring the precision requirement input by a user;

Determining a semantic radius according to the precision requirement;

3. The method of claim 1, wherein said inputting each of the second split texts into a preset neural network model to obtain a second semantic feature of each of the second split texts comprises:

4. The method according to claim 1, wherein the obtaining similar text from the preset database according to the target semantic feature, the similar text being similar to the target text comprises:

5. The method of claim 4, wherein the obtaining similar text from the preset database that is similar to the target text according to the target semantic features and training semantic features comprises:

6. The method of claim 1, wherein the neural network model is a Skip-Gram model based on Word2 vec.

7. A semantic-based similar text search system, comprising:

The text training module is used for acquiring training texts in a preset database; splitting the training text to obtain a plurality of second split texts; inputting each second split text into a preset neural network model to obtain second semantic features of each second split text, wherein the second semantic features are matrixes formed by occurrence probabilities of the rest second split texts of the second split text within a preset semantic radius; acquiring training semantic features of the training text, wherein the training semantic features are average values of a plurality of second semantic features; generating a semantic feature table according to the second semantic features;

the text acquisition module is used for acquiring a target text; the target text is a text of which the user needs to obtain a similar text; the target text comprises an electronic book, webpage news, journals and patents;

The feature searching module is used for searching first semantic features of each first split text in the semantic feature table generated based on the preset database;

8. A computer device, comprising:

one or more processors;

storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.