CN112749256A - Text processing method, device, equipment and storage medium - Google Patents

Text processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN112749256A
CN112749256A CN202011643798.3A CN202011643798A CN112749256A CN 112749256 A CN112749256 A CN 112749256A CN 202011643798 A CN202011643798 A CN 202011643798A CN 112749256 A CN112749256 A CN 112749256A
Authority
CN
China
Prior art keywords
text information
feature vector
text
calculating
time warping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011643798.3A
Other languages
Chinese (zh)
Inventor
任亮
傅雨梅
黄新涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyin Intelligent Technology Co ltd
Original Assignee
Beijing Zhiyin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyin Intelligent Technology Co ltd filed Critical Beijing Zhiyin Intelligent Technology Co ltd
Priority to CN202011643798.3A priority Critical patent/CN112749256A/en
Publication of CN112749256A publication Critical patent/CN112749256A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Databases & Information Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text processing method, a text processing device, text processing equipment and a storage medium, wherein the method comprises the following steps: acquiring first text information and second text information to be processed; respectively analyzing the first text information and the second text information to obtain a first feature vector of the first text information and a second feature vector of the second text information; calculating a time warping distance between the first feature vector and the second feature vector; and calculating the similarity information between the first text information and the second text information according to the time warping distance. The method and the device realize more accurate distinction of the text information with the vocabulary ambiguity.

Description

Text processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a text processing method, apparatus, device, and storage medium.
Background
With the popularization and development of mobile intelligent terminal devices and social networks, a great deal of short text data (in a text data form with the character length less than 200) such as news abstracts, microblog messages and commodity comments emerge, and how to dig out information with commercial value from the massive short text data becomes the key point of a plurality of Chinese natural language processing researchers at present. Chinese uses many people and vocabulary is abundant, has nimble various meaning expression, if through carrying out cluster analysis to news abstract's similarity comparison, draws hot topic or as retrieving key word, helps the user to know some more important news information fast. The short text has the characteristics of less number of characters, sparse content, rich semantic information and various expression forms, so the short text plays a great role in the artificial intelligence fields of machine translation, emotion analysis, information retrieval and the like.
For a special short text scene with ambiguous words, the static word vector model is single in word expression mode, difficult to dynamically combine with context, and unable to effectively express two or more kinds of word feature information through low-dimensional dense word vectors. For example, "I bought a bag of millet in supermarket" and "Rejun developed millet Mobile Phone Association in Beijing". The meaning of the expression of the term 'millet' in the two short texts is different, and the term 'millet' in the first short text represents grain and the term 'millet' in the second short text represents a smart phone by combining the context of the vocabulary. Because the same words with different word senses exist in the text, if the feature information expressed by the words in the current context cannot be mined, two short texts are difficult to distinguish.
Disclosure of Invention
The embodiment of the application provides a text processing method which is used for accurately distinguishing text information with lexical ambiguity.
The embodiment of the application provides a text processing method, which comprises the following steps:
acquiring first text information and second text information to be processed;
respectively analyzing the first text information and the second text information to obtain a first feature vector of the first text information and a second feature vector of the second text information;
calculating a time warping distance between the first feature vector and the second feature vector;
and calculating the similarity information between the first text information and the second text information according to the time warping distance.
In an embodiment, the calculating a time warping distance between the first feature vector and the second feature vector comprises:
and calculating the time warping distance between the first feature vector and the second feature vector of the DTW according to a dynamic programming method.
In an embodiment, the calculating a time warping distance between the first feature vector and the second feature vector according to a dynamic programming method DTW includes: when the time warping distance is calculated, calculating a projection matrix of the first eigenvector and a projection matrix of the second eigenvector by a typical correlation analysis method CCA; wherein the projection matrix of the first eigenvector and the projection matrix of the second eigenvector are used to calculate the time warping distance.
In an embodiment, the calculating the similarity information between the first text information and the second text information according to the time warping distance includes:
calculating the similarity information by adopting the following formula:
Figure BDA0002873334070000031
wherein s1 is the first text information, s2 is the second text information, ctw(s)1,s2) Represents the time warping distance, Sim(s), between the first text information s1 and the second text information s21,s2) And the final similarity information is obtained.
In an embodiment, the analyzing the first text information and the second text information respectively to obtain a first feature vector of the first text information and a second feature vector of the second text information includes:
performing word segmentation on the first text information to obtain a first keyword set, and performing word segmentation on the second text information to obtain a second keyword set;
and respectively inputting the first keyword set into a preset feature recognition model, outputting the first feature vector, inputting the second keyword set into the preset feature recognition model, and outputting the second feature vector.
In one embodiment, the step of establishing the preset feature recognition model includes:
obtaining a sample corpus, wherein the sample corpus is labeled with text words and syntactic structural features;
and training a bidirectional encoder characterization model by using the sample corpus to obtain the preset feature recognition model.
In one embodiment, the bidirectional encoder characterizes the model with 768 hidden layer neurons.
A second aspect of the embodiments of the present application provides a text information processing apparatus, including:
the acquisition module acquires first text information and second text information to be processed;
the analysis module is used for respectively analyzing the first text information and the second text information to obtain a first feature vector of the first text information and a second feature vector of the second text information;
a first calculation module for calculating a time warping distance between the first feature vector and the second feature vector;
and the second calculation module is used for calculating the similarity information between the first text information and the second text information according to the time warping distance.
A third aspect of embodiments of the present application provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the text processing method described above.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program executable by a processor to perform the text processing method described above.
According to the technical scheme provided by the embodiment of the application, the feature vector of each text message is obtained by analyzing the plurality of text messages, then the obtained plurality of feature vectors are calculated to obtain the time warping distance between every two text messages, and then the similarity information between every two text messages is obtained by calculating the time warping distance, so that the text messages with lexical ambiguity are more accurately distinguished.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a text processing method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating the sub-steps of step 210 according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating steps of establishing a predetermined feature recognition model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a text information processing apparatus according to an embodiment of the present application.
Reference numerals:
100-an electronic device; 110-a bus; 120-a processor; 130-a memory; 500-a text information processing apparatus; 510-an obtaining module; 520-a resolution module; 530-a first calculation module; 540-second calculation module.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
In the description of the present application, the terms "first," "second," and the like are used for distinguishing between descriptions and do not denote an order of magnitude, nor are they to be construed as indicating or implying relative importance.
In the description of the present application, the terms "comprises," "comprising," and/or the like, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
In the description of the present application, the terms "mounted," "disposed," "provided," "connected," and "configured" are to be construed broadly unless expressly stated or limited otherwise. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be mechanically or electrically connected; either directly or indirectly through intervening media, or may be internal to two devices, elements or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Please refer to fig. 1, which is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application, and includes at least one processor 120 and a memory 130, where fig. 1 illustrates one processor as an example. The processors 120 and the memory 130 are coupled by a bus 110, and the memory 130 stores instructions executable by the at least one processor 120, the instructions being executed by the at least one processor 120 to cause the at least one processor 120 to perform a real-time computing task processing method as in the embodiments described below.
In one embodiment, the Processor 120 may be a general-purpose Processor, including but not limited to a Central Processing Unit (CPU), a Network Processor (NP), etc., a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 120 is the control center of the electronic device 100 and connects the various parts of the entire electronic device 100 using various interfaces and lines. The processor 120 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application.
In one embodiment, the Memory 130 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, including but not limited to Random Access Memory (RAM), Read Only Memory (ROM), Static Random Access Memory (SRAM), Programmable Read-Only Memory (PROM), Erasable Read-Only Memory (EPROM), electrically Erasable Read-Only Memory (EEPROM), and the like.
In one embodiment, the electronic device 100 may also communicate with one or more external devices, such as a keyboard, a mouse, a bluetooth device, a pointing device, etc., to enable a user to interact with the electronic device 100.
The structure of the electronic device 100 shown in fig. 1 is merely illustrative, and the electronic device 100 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
As shown in fig. 2, which is a flowchart illustrating a text processing method according to an embodiment of the present application, the method may be executed by the electronic device 100 shown in fig. 1 to reduce development difficulty and improve development efficiency. The method comprises the following steps:
step 210: and acquiring first text information and second text information to be processed.
In the above steps, the first text information and the second text information include, but are not limited to: short text information such as news abstracts, microblog messages, commodity comments and the like.
Step 220: and respectively analyzing the first text information and the second text information to obtain a first feature vector of the first text information and a second feature vector of the second text information.
Step 230: a time warping distance between the first feature vector and the second feature vector is calculated.
In the above step, include: calculating a Time warping distance between the first characteristic vector and the second characteristic vector according to a Dynamic Time Warping (DTW) method; when calculating the time warping distance, calculating a projection matrix of the first eigenvector and a projection matrix of the second eigenvector by a typical Correlation analysis method cca (systematic Correlation analysis); and the projection matrix of the first eigenvector and the projection matrix of the second eigenvector are used for calculating the time warping distance.
In one operation: the first step is as follows: inputting a first feature vector X and a second feature vector Y; second step, initialize Vx=IdxAnd Vy=IdyWherein, in the step (A),
Figure BDA0002873334070000081
is a projection matrix of the first eigenvector X,
Figure BDA0002873334070000082
a projection matrix of a second eigenvector Y, Idx、IdyIs an identity matrix; the third step: calculating Wx and Wy according to a preset formula of dynamic programming, and executing the step in a circulating way, wherein the preset formula of dynamic programming is as follows:
Figure BDA0002873334070000083
wherein W is the efficiency matrix of the member task, and the value W of each item in WijRepresents a member xiCompletion of task tjRequired workerWhen the temperature of the water is higher than the set temperature,
Figure BDA0002873334070000084
for aligning the binary selection matrix of sequence X with sequence Y, WxAnd WyEncoding an alignment path; the fourth step: to align the two sequences, vectors are extracted from the representative variables Vx and Vy as generalized eigenvectors that introduce the coefficient b, and satisfy the formula:
Figure BDA0002873334070000085
until the time warping distance obtains the minimum value, finishing the calculation and outputting the minimum value; wherein
Figure BDA0002873334070000086
Figure BDA0002873334070000087
JctwThe calculation formula of (2) is as follows:
Figure BDA0002873334070000088
wherein F is a norm.
Step 240: and calculating the similarity information between the first text information and the second text information according to the time warping distance.
In the above steps, the similarity information is calculated by using the following formula:
Figure BDA0002873334070000091
where s1 is the first text message and s2 is the second text message, ctw(s)1,s2) Represents the time warping distance, Sim(s), between the first text information s1 and the second text information s21,s2) Is the final similarity information.
As shown in fig. 3, which is a schematic flow chart of the sub-steps of step 210 according to an embodiment of the present application, step 220: analyzing the first text information and the second text information respectively to obtain a first feature vector of the first text information and a second feature vector of the second text information, which may include:
step 221: and segmenting the first text information to obtain a first keyword set, and segmenting the second text information to obtain a second keyword set.
In the above steps, the model vocabulary may be loaded, a word segmentation device may be constructed, and then the word segmentation device may be used to perform word segmentation operation on the first text information and the second text information, so as to obtain a first keyword set and a second keyword set. And finally, performing part-of-speech tagging on the first keyword set and the second keyword set after the word segmentation operation by using a word segmentation device.
Step 222: and respectively inputting the first keyword set into a preset feature recognition model, outputting a first feature vector, inputting the second keyword set into the feature recognition model, and outputting a second feature vector.
In the above steps, each keyword is converted into a one-dimensional vector by querying the term vector table, and a first feature vector of the first text information and a second feature vector of the second text information are output, wherein the first feature vector and the second feature vector are vector representations corresponding to each keyword and combined with full-text semantic information.
As shown in fig. 4, which is a schematic flow chart illustrating a step of establishing a preset feature recognition model in an embodiment of the present application, the step of establishing the preset feature recognition model includes:
step 310: and acquiring a sample corpus, wherein the sample corpus is labeled with text words and syntactic structure characteristics.
In the above steps, before the preset feature recognition model processes the first text information and the second text information, a large batch of text corpora needs to be used for pre-training the preset feature recognition model, and for relevant processing of a news short text, the news corpora is used for training, so that the effect is better.
Step 320: and training the two-way encoder characterization model by adopting the sample corpora to obtain a preset feature recognition model.
In the above steps, at least one word in the sample corpus is replaced with a word mask respectively to obtain a sample corpus including at least one word mask; then, inputting the sample corpus including at least one word mask code into a bidirectional encoder characterization model (BERT), and outputting a context vector of each word mask code in the at least one word mask code through the bidirectional encoder characterization model; determining a word vector corresponding to each word mask based on the context vector and the word vector parameter matrix of each word mask respectively; and finally, training the two-way encoder characterization model based on the word vector corresponding to each word mask until a preset training completion condition is met, and obtaining a preset feature recognition model. The number of neurons in a hidden layer of a bidirectional encoder characterization model is 768, so the dimensionality of a feature vector of each piece of text information output by processing is 768, a short text array is generally used as input in the processing process, and the output result is also a sequence of 768-dimensional feature vectors.
As shown in fig. 5, which is a schematic structural diagram of a text information processing apparatus 500 according to an embodiment of the present application, the apparatus can be applied to the electronic device 100 shown in fig. 1, and includes: an acquisition module 510, a parsing module 520, a first calculation module 530, and a second calculation module 540. The principle relationship of the modules is as follows:
the obtaining module 510 is configured to obtain first text information and second text information to be processed.
The parsing module 520 is configured to parse the first text information and the second text information respectively to obtain a first feature vector of the first text information and a second feature vector of the second text information.
A first calculating module 530, configured to calculate a time warping distance between the first feature vector and the second feature vector.
The second calculating module 540 is configured to calculate similarity information between the first text information and the second text information according to the time warping distance.
For a detailed description of the text information processing apparatus 500, please refer to the description of the related method steps in the above embodiments.
An embodiment of the present invention further provides a storage medium readable by an electronic device, including: a program that, when run on an electronic device, causes the electronic device to perform all or part of the procedures of the methods in the above-described embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like. The storage medium may also comprise a combination of memories of the kind described above.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method of text processing, comprising:
acquiring first text information and second text information to be processed;
respectively analyzing the first text information and the second text information to obtain a first feature vector of the first text information and a second feature vector of the second text information;
calculating a time warping distance between the first feature vector and the second feature vector;
and calculating the similarity information between the first text information and the second text information according to the time warping distance.
2. The method of claim 1, wherein the calculating a time warping distance between the first feature vector and the second feature vector comprises:
and calculating the time warping distance between the first feature vector and the second feature vector according to a dynamic programming method (DTW).
3. The method of claim 2, wherein the calculating a time warping distance between the first feature vector and the second feature vector according to a dynamic programming method (DTW) comprises:
when the time warping distance is calculated, calculating a projection matrix of the first eigenvector and a projection matrix of the second eigenvector by a typical correlation analysis method CCA; wherein the projection matrix of the first eigenvector and the projection matrix of the second eigenvector are used to calculate the time warping distance.
4. The method according to claim 1, wherein said calculating similarity information between the first text information and the second text information according to the time warping distance comprises:
calculating the similarity information by adopting the following formula:
Figure FDA0002873334060000021
wherein s1 is the first text information, s2 is the second text information, ctw(s)1,s2) Represents the time warping distance, Sim(s), between the first text information s1 and the second text information s21,s2) And the final similarity information is obtained.
5. The method of claim 1, wherein the parsing the first text message and the second text message respectively to obtain a first feature vector of the first text message and a second feature vector of the second text message comprises:
performing word segmentation on the first text information to obtain a first keyword set, and performing word segmentation on the second text information to obtain a second keyword set;
and respectively inputting the first keyword set into a preset feature recognition model, outputting the first feature vector, inputting the second keyword set into the preset feature recognition model, and outputting the second feature vector.
6. The method of claim 5, further comprising: the step of establishing a preset feature recognition model comprises the following steps:
obtaining a sample corpus, wherein the sample corpus is labeled with text words and syntactic structural features;
and training a bidirectional encoder characterization model by using the sample corpus to obtain the preset feature recognition model.
7. The method of claim 6, wherein the number of hidden layer neurons in the bi-directional encoder characterization model is 768.
8. A text processing apparatus, comprising:
the acquisition module acquires first text information and second text information to be processed;
the analysis module is used for respectively analyzing the first text information and the second text information to obtain a first feature vector of the first text information and a second feature vector of the second text information;
a first calculation module for calculating a time warping distance between the first feature vector and the second feature vector;
and the second calculation module is used for calculating the similarity information between the first text information and the second text information according to the time warping distance.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the text processing method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program executable by a processor to perform the text processing method of any one of claims 1-7.
CN202011643798.3A 2020-12-30 2020-12-30 Text processing method, device, equipment and storage medium Pending CN112749256A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011643798.3A CN112749256A (en) 2020-12-30 2020-12-30 Text processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011643798.3A CN112749256A (en) 2020-12-30 2020-12-30 Text processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112749256A true CN112749256A (en) 2021-05-04

Family

ID=75649430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011643798.3A Pending CN112749256A (en) 2020-12-30 2020-12-30 Text processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112749256A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292620A (en) * 2022-08-09 2022-11-04 腾讯科技(深圳)有限公司 Region information identification method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN108768950A (en) * 2018-04-28 2018-11-06 山东亚华电子股份有限公司 A kind of medical communication account management method and system
CN109858015A (en) * 2018-12-12 2019-06-07 湖北工业大学 A kind of semantic similarity calculation method and device based on CTW and KM algorithm
CN110442871A (en) * 2019-08-06 2019-11-12 北京百度网讯科技有限公司 Text message processing method, device and equipment
CN110909550A (en) * 2019-11-13 2020-03-24 北京环境特性研究所 Text processing method and device, electronic equipment and readable storage medium
CN111414746A (en) * 2020-04-10 2020-07-14 中国建设银行股份有限公司 Matching statement determination method, device, equipment and storage medium
CN111832301A (en) * 2020-07-28 2020-10-27 电子科技大学 Chinese word vector generation method based on adaptive component n-tuple

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768950A (en) * 2018-04-28 2018-11-06 山东亚华电子股份有限公司 A kind of medical communication account management method and system
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN109858015A (en) * 2018-12-12 2019-06-07 湖北工业大学 A kind of semantic similarity calculation method and device based on CTW and KM algorithm
CN110442871A (en) * 2019-08-06 2019-11-12 北京百度网讯科技有限公司 Text message processing method, device and equipment
CN110909550A (en) * 2019-11-13 2020-03-24 北京环境特性研究所 Text processing method and device, electronic equipment and readable storage medium
CN111414746A (en) * 2020-04-10 2020-07-14 中国建设银行股份有限公司 Matching statement determination method, device, equipment and storage medium
CN111832301A (en) * 2020-07-28 2020-10-27 电子科技大学 Chinese word vector generation method based on adaptive component n-tuple

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292620A (en) * 2022-08-09 2022-11-04 腾讯科技(深圳)有限公司 Region information identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11544474B2 (en) Generation of text from structured data
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
WO2021135469A1 (en) Machine learning-based information extraction method, apparatus, computer device, and medium
US20220318275A1 (en) Search method, electronic device and storage medium
WO2020147409A1 (en) Text classification method and apparatus, computer device, and storage medium
US20230130006A1 (en) Method of processing video, method of quering video, and method of training model
CN111459977B (en) Conversion of natural language queries
CN111078842A (en) Method, device, server and storage medium for determining query result
CN111324771A (en) Video tag determination method and device, electronic equipment and storage medium
CN113434636A (en) Semantic-based approximate text search method and device, computer equipment and medium
CN111783471A (en) Semantic recognition method, device, equipment and storage medium of natural language
CN108763202A (en) Method, apparatus, equipment and the readable storage medium storing program for executing of the sensitive text of identification
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN112906368B (en) Industry text increment method, related device and computer program product
CN111444712B (en) Keyword extraction method, terminal and computer readable storage medium
CN112749256A (en) Text processing method, device, equipment and storage medium
CN112765976A (en) Text similarity calculation method, device and equipment and storage medium
US20230334075A1 (en) Search platform for unstructured interaction summaries
CN117194616A (en) Knowledge query method and device for vertical domain knowledge graph, computer equipment and storage medium
CN112347365A (en) Target search information determination method and device
CN113392630A (en) Semantic analysis-based Chinese sentence similarity calculation method and system
CN109189932B (en) Text classification method and device and computer-readable storage medium
CN112732913B (en) Method, device, equipment and storage medium for classifying unbalanced samples
CN113688268B (en) Picture information extraction method, device, computer equipment and storage medium
CN116756596B (en) Text clustering model training method, text clustering device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination