CN117093729A - Retrieval method, system and retrieval terminal based on medical scientific research information - Google Patents

Retrieval method, system and retrieval terminal based on medical scientific research information Download PDF

Info

Publication number
CN117093729A
CN117093729A CN202311336929.7A CN202311336929A CN117093729A CN 117093729 A CN117093729 A CN 117093729A CN 202311336929 A CN202311336929 A CN 202311336929A CN 117093729 A CN117093729 A CN 117093729A
Authority
CN
China
Prior art keywords
information
retrieval
medical scientific
user
scientific research
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311336929.7A
Other languages
Chinese (zh)
Other versions
CN117093729B (en
Inventor
蒋江涛
马杰
金剑
邓小宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North Health Medical Big Data Technology Co ltd
Original Assignee
North Health Medical Big Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North Health Medical Big Data Technology Co ltd filed Critical North Health Medical Big Data Technology Co ltd
Priority to CN202311336929.7A priority Critical patent/CN117093729B/en
Publication of CN117093729A publication Critical patent/CN117093729A/en
Application granted granted Critical
Publication of CN117093729B publication Critical patent/CN117093729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Pathology (AREA)
  • Library & Information Science (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a retrieval method, a retrieval system and a retrieval terminal based on medical scientific research information, which belong to the technical field of medical scientific research information processing and acquire retrieval information input by a user; analyzing the search information based on a natural language processing mode of the large model, analyzing sentence structures, word meaning relations and context information, and extracting keyword information; configuring the extracted keyword information into a retrieval expression; matching the medical scientific literature in the medical scientific literature database based on the retrieval expression; and displaying the matched medical scientific literature on a user interface. According to the medical scientific research-oriented retrieval method based on the large model, the intention of the user can be more accurately understood based on the natural language processing technology, and the relevance of the retrieval result is improved. And the search document can be displayed for the user, so that better user experience is provided.

Description

Retrieval method, system and retrieval terminal based on medical scientific research information
Technical Field
The application belongs to the technical field of medical scientific research information retrieval, and particularly relates to a retrieval method, a retrieval system and a retrieval terminal based on medical scientific research information.
Background
In the field of medical research, researchers need to obtain relevant information from a large amount of documents and data to support the research work. Conventional retrieval methods typically require the user to edit the retrieved expression, but such an approach can be difficult for non-professionals and can take a significant amount of time to edit the expression.
The existing implementation scheme most similar to the application is a keyword-based retrieval method. This approach requires the user to edit the retrieval expression, retrieving documents and data through keyword matching. However, this method has the following disadvantages: user editing expressions is difficult: non-professionals may not be familiar with domain-related terms and expressions, resulting in difficulty in editing expressions; the time consumption is large: editing complex search expressions takes a lot of time, reducing the search efficiency of the user.
Disadvantages of the prior art: user editing expressions is difficult: non-professionals may not be familiar with related terms and expressions in the field, resulting in difficulty in editing the expression. Moreover, the medical scientific research literature retrieval consumes long time, professional vocabularies are required to be edited and then combined to perform retrieval and search, and if the professional vocabularies cannot form an effective retrieval formula, the literature which is required to be queried cannot be matched, so that the use experience of scientific research staff on the system is affected.
Disclosure of Invention
The application provides a retrieval method based on medical scientific research information, which can solve the problems of difficult and large time consumption of traditional editing and retrieving expressions. The application aims to improve the retrieval efficiency of a user and reduce the time for editing expressions by the user.
The method comprises the following steps:
step one, acquiring search information input by a user;
analyzing the search information based on a natural language processing mode of the large model, analyzing sentence structures, word meaning relations and context information, and extracting keyword information;
wherein, big model includes: a plurality of identical encoder layers, each encoder layer comprising a self-attention mechanism and a feed-forward neural network;
the mathematical formula for the self-attention mechanism is as follows:
Attention(Q1, K1, V1) = softmax(Q1×K1^T 1/ sqrt(d_k1)) * V1;
wherein Q1, K1 and V1 represent input matrices of queries, keys and values, respectively, d_k1 represents the dimension of the attention mechanism;
the mathematical formula of the feedforward neural network is as follows:
FFN(x1) = max(0, xW_1 + b_1)W_2 + b_2;
where x1 represents the input vector, w_1, b_1, w_2, and b_2 represent parameters of the model;
the large model further includes: a plurality of identical decoder layers, each decoder layer containing a self-attention mechanism and an encoder-decoder attention mechanism;
in large model structures, the connection between the encoder layer and the decoder layer uses residual connection and layer normalization;
step three, acquiring a logical operator set by a user, and combining the extracted keyword information based on the logical operator set by the user to configure a retrieval expression;
step four, matching with medical scientific research literature in a medical scientific research literature database based on the retrieval expression;
and fifthly, displaying the matched medical scientific research literature on a user interface.
It should be further noted that, in the second step, the method for analyzing the search information based on the natural language processing method of the large model further includes:
segmenting into words based on the input natural language text;
the part of speech of each word is determined, and keyword information is extracted.
It should be further noted that, in the second step, the method for analyzing the search information based on the natural language processing method of the large model further includes:
and analyzing sentence structures in the search information, determining the dependency relationship among the words, and extracting keyword information.
It should be further noted that, in the second step, the method for analyzing the search information based on the natural language processing method of the large model further includes: and performing lexical analysis on sentences in the search information input by the user, and segmenting the sentences into words.
It should be further noted that, performing lexical analysis on the sentence includes: the dependency relationship between words in the sentence is determined and implemented using a dependency syntax analysis manner or a phrase structure syntax analysis manner.
It should be further noted that the logical operators include: and logic, or logic, and not logic.
It should be further noted that, the fourth step further includes: a plurality of medical scientific literature is stored in a medical scientific literature database;
each medical scientific literature is configured with a keyword tag;
matching the keyword labels in the medical scientific literature database based on the retrieval expression;
and displaying the medical scientific literature corresponding to the matched keyword label on a user interface.
The application also provides a retrieval system based on medical scientific research information, which comprises: the system comprises an information input module, an information analysis module, an expression configuration module, a document matching module and a document display module;
the information input module is used for acquiring search information input by a user;
the information analysis module is used for analyzing the search information by combining the natural language processing mode of the large model, analyzing sentence structure, word meaning relation and context information, and extracting keyword information;
wherein, big model includes: a plurality of identical encoder layers, each encoder layer comprising a self-attention mechanism and a feed-forward neural network;
the mathematical formula for the self-attention mechanism is as follows:
Attention(Q1, K1, V1) = softmax(Q1×K1^T 1/ sqrt(d_k1)) * V1;
wherein Q1, K1 and V1 represent input matrices of queries, keys and values, respectively, d_k1 represents the dimension of the attention mechanism;
the mathematical formula of the feedforward neural network is as follows:
FFN(x1) = max(0, xW_1 + b_1)W_2 + b_2;
where x1 represents the input vector, w_1, b_1, w_2, and b_2 represent parameters of the model;
the large model further includes: a plurality of identical decoder layers, each decoder layer containing a self-attention mechanism and an encoder-decoder attention mechanism;
in large model structures, the connection between the encoder layer and the decoder layer uses residual connection and layer normalization; the expression configuration module is used for acquiring the logical operators set by the user, combining the extracted keyword information based on the logical operators set by the user and configuring the extracted keyword information into a retrieval expression;
the document matching module is used for matching with medical scientific research documents in the medical scientific research document database according to the retrieval expression;
and the document display module is used for displaying the information of the retrieval process and displaying the matched medical scientific research documents.
The application also provides a retrieval terminal which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the retrieval method based on medical scientific research information when executing the program.
From the above technical scheme, the application has the following advantages:
in the retrieval method based on medical scientific research information, a user inputs retrieval requirements in a natural language expression mode, and an interface transmits user input to a large model for natural language understanding. The large model converts the user input into a retrieval expression and passes it to the retrieval system. The retrieval system retrieves relevant medical scientific research documents and data from the database according to the retrieval expression, and returns the result to the user interface for display to the user. Therefore, the user does not need to edit complex search expressions, and searches in a natural language expression mode, so that the learning cost and editing time of the user are reduced. Moreover, the large model related by the application has natural language understanding capability, so that the intention of a user can be more accurately understood, and the relevance of a search result is improved. The user-friendly interface provides a better user experience that enables non-professionals to easily retrieve as well.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a retrieval method based on medical scientific research information;
fig. 2 is a schematic diagram of a retrieval system based on medical scientific research information.
Detailed Description
The retrieval method based on the medical scientific research information provided by the application is mainly a retrieval mode aiming at the medical scientific research field, and is used for providing retrieval of medical scientific research documents for researchers. In order to facilitate scientific research personnel to search medical scientific research documents, the application can acquire and process the associated data based on an artificial intelligence technology. The method may include techniques such as dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing techniques, operation/interaction systems, and the like. The retrieval method based on medical scientific research information mainly comprises a computer visual angle technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. Of course, for deep learning, techniques such as artificial neural network, confidence network, reinforcement learning, transfer learning, induction learning, teaching learning, and the like are generally included. And further can quickly match the medical scientific literature wanted by the scientific researchers. The method can be matched with a corresponding number of medical scientific research documents for reference of scientific researchers, and a certain medical scientific research document can be accurately found. The problems that the time consumed by medical scientific research literature retrieval is long, professional vocabularies are required to be edited and then combined to perform retrieval and search, and if the professional vocabularies cannot form an effective retrieval type, the literature which is required to be queried cannot be matched, and the use experience of scientific research personnel on a system is affected are further effectively solved.
The retrieval method based on medical scientific research information can be applied to one or more retrieval terminals, wherein the retrieval terminals are equipment capable of automatically carrying out numerical calculation and/or information processing according to preset or stored instructions, and the hardware comprises, but is not limited to, microprocessors, application-specific integrated circuits (SpecificIntegratedCircuit, ASIC), programmable gate arrays (Field-ProgrammableGate Array, FPGA), digital processors (DigitalSignalProcessor, DSP), embedded equipment and the like.
The search terminal may be any electronic product that can interact with a user, such as a personal computer, tablet computer, smart phone, personal digital assistant (PersonalDigitalAssistant, PDA), interactive web TV (InternetProtocolTelevision, IPTV), etc.
The network in which the search terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VirtualPrivateNetwork, VPN), and the like.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a flowchart of a method for retrieving information based on medical scientific research in an embodiment is shown, where the method includes:
s101, acquiring search information input by a user;
according to an embodiment of the present application, a user may input information to be retrieved through a corresponding device such as a keyboard, keypad, switch, dial, mouse, track ball, voice recognizer, etc. Of course not limited to typing and speech. The input can be in the form of sentences or phrases.
S102, analyzing the search information based on a natural language processing mode of a large model, analyzing sentence structures, word meaning relations and context information, and extracting keyword information;
according to the embodiment of the application, the natural language processing mode of the large model can adopt models such as T5, GLM and GPT models, and the models have strong natural language understanding capability. The large model can understand the natural language input by the user, extract key information therein, and convert the key information into a retrieval expression which can be understood by the retrieval system.
According to the embodiment, the semantic understanding capability of the large model is utilized, the search information input by the user is automatically converted into the keyword information, and then the corresponding search formula is matched, so that automatic search can be realized. The searched documents are returned to the user, so that the user can interact with the search system in a natural language mode directly, a search expression is not needed to be constructed manually in a nor mode, operability of the search process is improved, and search efficiency is improved.
As an example, the large model of the present embodiment may use a transducer-encoder, transformer-decoder and transducer structure, where the transducer-decoder is used primarily to encode input sequences, such as natural language text. The large model is composed of multiple identical encoder layers, each of which contains a self-attention mechanism and a feed-forward neural network.
The mathematical formula for the self-attention mechanism is as follows:
Attention(Q1, K1, V1) = softmax(Q1×K1^T 1/ sqrt(d_k1)) * V1;
where Q1, K1, and V1 represent input matrices of queries (queries), keys (keys), and values (values), respectively, and d_k1 represents the dimension of the attention mechanism.
The mathematical formula of the feedforward neural network is as follows:
FFN(x1) = max(0, xW_1 + b_1)W_2 + b_2;
where x1 represents the input vector and w_1, b_1, w_2, and b_2 represent parameters of the model.
For the transducer-decoder structure, it is mainly used to generate output sequences, such as machine translation tasks. The large model is composed of multiple identical decoder layers, each containing a self-attention mechanism, encoder-decoder attention mechanism. The mathematical formula of the self-attention mechanism is the same as in the transducer-encoder.
The mathematical formula of the encoder-decoder attention mechanism of this embodiment is as follows:
Attention(Q2, K2, V2) = softmax(Q2×K2^T 2/ sqrt(d_k2)) * V2;
where Q2 represents a query vector of the decoder, K2 represents a key vector of the encoder, and V2 represents a value vector of the encoder. The mathematical formula of the feedforward neural network is the same as that in the transducer-encoder.
For a transducer, the transducer structure is a combination of a transducer-encoder and a transducer-decoder for sequence-to-sequence tasks such as machine translation. It is formed by alternately stacking a plurality of encoder layers and a plurality of decoder layers.
In the Transformer structure, the connection between the encoder layer and the decoder layer uses a residual connection (residual connection) and layer normalization (layer normalization).
The mathematical formula for the residual connection is as follows:
LayerNorm(x3+ Sublayer(x3));
where x3 represents the input vector and subayer (x 3) represents the output of the sub-layer.
The mathematical formula for layer normalization is as follows:
LayerNorm(x3) = (x3 - mean(x3)) / sqrt(var(x3) + Ep) * GA + Be;
where mean (x 3) and var (x 3) represent the mean and variance of x3, respectively, ep is a small constant for numerical stability, GA and Be are learnable parameters.
Through the processing mode, the transducer model can effectively capture the context information in the input sequence and generate a corresponding output sequence.
In one exemplary embodiment, training of a large model generally includes the steps of:
data preparation: a large-scale training dataset is prepared, including input samples and corresponding target outputs.
Model initialization: the weights and bias parameters of the neural network are initialized.
Forward propagation: and carrying out forward propagation on the input sample through a neural network to obtain the prediction output of the model.
Calculating loss: and comparing the predicted output of the model with the target output, calculating the value of the loss function, and measuring the difference between the predicted result and the real result.
Back propagation: the gradient of the loss function to the model parameters, i.e. the derivative of the loss function with respect to the weights and biases, is calculated by a back propagation algorithm.
Parameter updating: the parameters of the model are updated according to gradient information by using an optimization algorithm (such as gradient descent) so that the loss function is gradually reduced.
Repeating the iteration: the steps of forward propagation, loss calculation, backward propagation and parameter updating are repeatedly performed until a predetermined number of training rounds or convergence conditions are reached.
The training process of large models usually requires the use of large-scale computing resources and training time, and may be accelerated using techniques such as distributed training, parallel computing, etc.
In this embodiment, the large model learns statistical rules and semantic information of the language by exposing it to a large amount of natural language text data during training. It learns rich linguistic knowledge by modeling the relationships between words, sentences and contexts in text data. When a user inputs natural language, the large model can understand and process the input according to the learned knowledge, extract key information in the input and convert the key information into a retrieval expression which can be understood by a retrieval system.
According to an embodiment of the application, the method utilizes natural language processing techniques to semantically analyze and understand natural language input of the user. By analyzing sentence structure, word meaning relation and context information input by a user, the system can accurately understand the retrieval intention of the user and extract key information therein. Illustratively, the user may enter "find new drugs for treating cancer," and the system will understand that the user needs to find new drugs associated with treating cancer. Here, the information input by the user is split, which can be based on verbs and nouns, and can split fixed language, scholarly language and the like. Then, keywords are identified to be combined to form a search formula for searching. For "find new drugs for treating cancer", it can be split into "treat cancer" and "drug". This matches the corresponding document. For "treating cancer" and "drug" can be based on the user selection of a logical combination mode, for example, the user selection and the logical mode, then the "treating cancer" and the "drug" are logically combined to form a search combination mode to meet the search requirement.
The overall process steps for the natural language processing approach of the present application generally include the following stages:
word segmentation: the input natural language text is segmented into words.
Part of speech tagging: each word is determined with respect to its part of speech (e.g., noun, verb, etc.).
Syntax analysis: the structure of the sentence is analyzed and the dependency relationship between the words is determined.
Semantic analysis: the meaning of the sentence is understood, and the meaning of the sentence and the meaning of the expression are determined.
Semantic understanding: key information and semantic roles are extracted from sentences, and meaning and intention of the sentences are understood.
In the process of semantic analysis and understanding, commonly used mathematical models include:
word embedding models (e.g., word2Vec, gloVe): the words are mapped to a continuous vector space, capturing semantic relationships between the words.
Recurrent neural network (Recurrent Neural Network, RNN): for processing sequence data such as sentences or text.
Attention mechanism (Attention Mechanism): for weighted attention to different parts of the input when processing long text.
Converter model (transducer): a neural network model based on a self-attention mechanism is used for processing sequence data.
In this embodiment, after the search information input by the user is analyzed, a sentence structure and a word meaning relationship may be formed. Natural language processing (Natural Language Processing, NLP) techniques and corresponding mathematical models may be generally employed herein.
The following is one way for the large model to accomplish this:
lexical Analysis (Lexical Analysis): the large model firstly carries out lexical analysis on sentences input by a user, and cuts the sentences into words. This may be accomplished using a lexical analyzer or a pre-trained lexical analysis model.
Syntactic analysis (Syntactic Analysis): next, the large model performs syntactic analysis, analyzes the structure of the sentence, and determines the dependency relationship between words, such as a master-predicate relationship, a motor-guest relationship, and the like. Syntactic analysis may be implemented using a dependency syntactic analyzer, a phrase structure syntactic analyzer, or a pre-trained syntactic analysis model.
Semantic analysis (Semantic Analysis): in the semantic analysis stage, the large model understands the semantics of the sentence and determines the meaning of the sentence and the meaning of the expression. This includes tasks such as word sense disambiguation, reference resolution, semantic role labeling, etc. Semantic analysis may be implemented using word sense disambiguation models, reference resolution models, semantic role annotation models, or pre-trained semantic analysis models.
Context Modeling (Context Modeling): to better understand the meaning of a sentence, the large model considers context information in the sentence, including both the context and the postamble. Context modeling may model and represent sentences using context-aware models, such as Recurrent Neural Networks (RNNs) or transducer models (transducers).
Through the steps, the large model can analyze sentences input by the user, and extract the structure, word meaning relation and context information of the sentences, so that the intention and the requirement of the user are better understood. In this way, the large model may further translate user input into a search expression that the search system can understand and process, and provide more accurate search results. It should be noted that the specific implementation and the model used may vary depending on the application scenario and the specific requirements.
S103, acquiring a logical operator set by a user, and combining the extracted keyword information based on the logical operator set by the user to configure a retrieval expression;
the search expression of the present embodiment is formed by effectively combining extracted keyword information.
It should be noted that, the combination manner of the keyword information may obtain the user setting in the logical operator, and match the extracted keyword information based on the logical operator set by the user to form the search expression.
Optionally, the logical operators include: and logic, or logic, and not logic.
Such as retrieving a cancer treatment medication, the user may select a logical relationship while entering the retrieved information.
Of course, the system may default the logical relationship to AND logic. Namely "treatment" and "cancer" and "medicament", although it may be arranged as or logic.
Thus, the search expression is formed by converting the natural language input by the user into a structured search expression after the natural language processing so that the search system can understand and process the search expression.
S104, matching the medical scientific literature in the medical scientific literature database based on the retrieval expression;
it will be appreciated that a large number of medical research documents are pre-stored and collected in the medical research document database, and of course, the medical research documents are not limited to the medical research documents, and other documents may be involved. In order to facilitate the search of the system for the documents, each medical scientific research document is configured with a keyword tag; of course, if the documents have more chapters, a plurality of keyword labels can be configured on the medical scientific research documents, and the documents can be matched only by matching the search keywords.
The setting of the keyword label can be set based on the topic name of medical scientific literature, the related field, abstract, the content of core chapter and the like, and can be automatically matched by a system or manually set.
In this way, matching is performed with keyword tags in the medical scientific literature database based on the search expression; and displaying the medical scientific literature corresponding to the matched keyword label on a user interface.
And S105, displaying the matched medical scientific research literature on a user interface.
According to the embodiment of the application, a user-friendly interface is provided, and a user can input the retrieval requirement in a natural language expression mode. The user interface may also provide real-time feedback to help the user better understand and adjust the retrieval needs.
Here, the search result may be continuously optimized according to the feedback of the user, and one of the following manners may be adopted: feedback loop: after the user obtains the search results, the system may ask the user to provide feedback, such as by scoring, likes/dislikes, and the like. Based on the user feedback, the system may adjust the search algorithm or reorder the results to provide results that more meet the user's needs.
Reinforcement learning: the system may use a reinforcement learning algorithm to learn how to optimize the search results through interactions with the user. The system adjusts the parameters of the search algorithm according to the feedback (reward signal) of the user so that the future search result meets the user requirement.
User model: the system can build a user model according to the historical behaviors and preferences of the user, predict the preferences and demands of the user by using the model, and optimize the retrieval result according to the prediction result.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
According to the retrieval method based on medical scientific research information, a user inputs retrieval requirements in a natural language expression mode, and the interface transmits the user input to the large model for natural language understanding. The large model converts the user input into a retrieval expression and passes it to the retrieval system. The retrieval system retrieves relevant medical scientific research documents and data from the database according to the retrieval expression, and returns the result to the user interface for display to the user. Therefore, the user does not need to edit complex search expressions, and searches in a natural language expression mode, so that the learning cost and editing time of the user are reduced. Moreover, the large model related by the application has natural language understanding capability, so that the intention of a user can be more accurately understood, and the relevance of a search result is improved. The user-friendly interface provides a better user experience that enables non-professionals to easily retrieve as well.
The following is an embodiment of a medical scientific information-based search system provided by the embodiment of the present disclosure, which belongs to the same inventive concept as the medical scientific information-based search method of the above embodiments, and details which are not described in detail in the medical scientific information-based search system embodiment may refer to the above medical scientific information-based search method embodiment.
As shown in fig. 2, the system includes: the system comprises an information input module, an information analysis module, an expression configuration module, a document matching module and a document display module;
an input device for providing information input for a user by a user of the information input module, and acquiring search information input by the user based on the input device;
the information analysis module analyzes the search information based on the natural language processing mode of the large model, analyzes sentence structure, word meaning relation and context information, and extracts keyword information;
the expression configuration module is used for configuring the extracted keyword information into a retrieval expression;
the document matching module is used for matching the medical scientific literature in the medical scientific literature database based on the retrieval expression;
the document display module provides a display module of system operation information, displays retrieval process information and displays matched medical scientific research documents on a user interface.
The medical scientific research information-based retrieval system provided by the application is a unit and algorithm step of each example described in connection with the embodiments disclosed herein, and can be implemented in electronic hardware, computer software or a combination of both, and in order to clearly illustrate the interchangeability of hardware and software, the components and steps of each example have been generally described in terms of functions in the above description. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
As will be readily understood by those skilled in the art from the description of the above embodiments, the retrieval system based on medical scientific information described herein may be implemented by software or by a combination of software and necessary hardware. Accordingly, the technical solution according to the disclosed embodiments of the retrieval method based on medical scientific information may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the indexing method according to the disclosed embodiments.
In embodiments of the present application, computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. The retrieval method based on the medical scientific research information is characterized by comprising the following steps:
step one, acquiring search information input by a user;
analyzing the search information based on a natural language processing mode of the large model, analyzing sentence structures, word meaning relations and context information, and extracting keyword information;
wherein, big model includes: a plurality of identical encoder layers, each encoder layer comprising a self-attention mechanism and a feed-forward neural network;
the mathematical formula for the self-attention mechanism is as follows:
Attention(Q1, K1, V1) = softmax(Q1×K1^T 1/ sqrt(d_k1)) * V1;
wherein Q1, K1 and V1 represent input matrices of queries, keys and values, respectively, d_k1 represents the dimension of the attention mechanism;
the mathematical formula of the feedforward neural network is as follows:
FFN(x1) = max(0, xW_1 + b_1)W_2 + b_2;
where x1 represents the input vector, w_1, b_1, w_2, and b_2 represent parameters of the model;
the large model further includes: a plurality of identical decoder layers, each decoder layer containing a self-attention mechanism and an encoder-decoder attention mechanism;
in large model structures, the connection between the encoder layer and the decoder layer uses residual connection and layer normalization;
step three, acquiring a logical operator set by a user, and combining the extracted keyword information based on the logical operator set by the user to configure a retrieval expression;
step four, matching with medical scientific research literature in a medical scientific research literature database based on the retrieval expression;
and fifthly, displaying the matched medical scientific research literature on a user interface.
2. The medical scientific research information-based retrieval method according to claim 1, wherein the analyzing the retrieval information based on the natural language processing mode of the large model in the second step further comprises:
segmenting into words based on the input natural language text;
the part of speech of each word is determined, and keyword information is extracted.
3. The medical scientific research information-based retrieval method according to claim 1, wherein the analyzing the retrieval information based on the natural language processing mode of the large model in the second step further comprises:
and analyzing sentence structures in the search information, determining the dependency relationship among the words, and extracting keyword information.
4. The medical scientific research information-based retrieval method according to claim 3, wherein the parsing of the retrieval information based on the natural language processing method of the large model in the second step further comprises: and performing lexical analysis on sentences in the search information input by the user, and segmenting the sentences into words.
5. The method for retrieving information based on medical science research of claim 4, wherein lexical analysis of sentences comprises: the dependency relationship between words in the sentence is determined and implemented using a dependency syntax analysis manner or a phrase structure syntax analysis manner.
6. The medical research information-based retrieval method of claim 1, wherein the logical operators include: and logic, or logic, and not logic.
7. The medical research information-based retrieval method of claim 1, wherein step four further comprises: a plurality of medical scientific literature is stored in a medical scientific literature database;
each medical scientific literature is configured with a keyword tag;
matching the keyword labels in the medical scientific literature database based on the retrieval expression;
and displaying the medical scientific literature corresponding to the matched keyword label on a user interface.
8. A retrieval system based on medical scientific research information, characterized in that the system implements the retrieval method based on medical scientific research information according to any one of claims 1 to 7;
the system comprises: the system comprises an information input module, an information analysis module, an expression configuration module, a document matching module and a document display module;
the information input module is used for acquiring search information input by a user;
the information analysis module is used for analyzing the search information by combining the natural language processing mode of the large model, analyzing sentence structure, word meaning relation and context information, and extracting keyword information;
wherein, big model includes: a plurality of identical encoder layers, each encoder layer comprising a self-attention mechanism and a feed-forward neural network;
the mathematical formula for the self-attention mechanism is as follows:
Attention(Q1, K1, V1) = softmax(Q1×K1^T 1/ sqrt(d_k1)) * V1;
wherein Q1, K1 and V1 represent input matrices of queries, keys and values, respectively, d_k1 represents the dimension of the attention mechanism;
the mathematical formula of the feedforward neural network is as follows:
FFN(x1) = max(0, xW_1 + b_1)W_2 + b_2;
where x1 represents the input vector, w_1, b_1, w_2, and b_2 represent parameters of the model;
the large model further includes: a plurality of identical decoder layers, each decoder layer containing a self-attention mechanism and an encoder-decoder attention mechanism;
in large model structures, the connection between the encoder layer and the decoder layer uses residual connection and layer normalization; the expression configuration module is used for acquiring the logical operators set by the user, combining the extracted keyword information based on the logical operators set by the user and configuring the extracted keyword information into a retrieval expression;
the document matching module is used for matching with medical scientific research documents in the medical scientific research document database according to the retrieval expression;
and the document display module is used for displaying the information of the retrieval process and displaying the matched medical scientific research documents.
9. A search terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the search method based on medical scientific information according to any one of claims 1 to 7 when executing the program.
CN202311336929.7A 2023-10-17 2023-10-17 Retrieval method, system and retrieval terminal based on medical scientific research information Active CN117093729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311336929.7A CN117093729B (en) 2023-10-17 2023-10-17 Retrieval method, system and retrieval terminal based on medical scientific research information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311336929.7A CN117093729B (en) 2023-10-17 2023-10-17 Retrieval method, system and retrieval terminal based on medical scientific research information

Publications (2)

Publication Number Publication Date
CN117093729A true CN117093729A (en) 2023-11-21
CN117093729B CN117093729B (en) 2024-01-09

Family

ID=88783580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311336929.7A Active CN117093729B (en) 2023-10-17 2023-10-17 Retrieval method, system and retrieval terminal based on medical scientific research information

Country Status (1)

Country Link
CN (1) CN117093729B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117493585A (en) * 2023-12-29 2024-02-02 安徽大学 Data retrieval system based on large language model
CN117688220A (en) * 2023-12-12 2024-03-12 山东浪潮科学研究院有限公司 Multi-mode information retrieval method and system based on large language model
CN117877737A (en) * 2024-03-12 2024-04-12 北方健康医疗大数据科技有限公司 Method, system and device for constructing primary lung cancer risk prediction model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018031656A1 (en) * 2016-08-09 2018-02-15 Ripcord, Inc. Systems and methods for contextual retrieval of electronic records
WO2019027696A1 (en) * 2017-08-03 2019-02-07 Motorola Solutions, Inc. Role-based perception filter
CN113239148A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological resource retrieval method based on machine reading understanding
CN114020862A (en) * 2021-11-04 2022-02-08 中国矿业大学 Retrieval type intelligent question-answering system and method for coal mine safety regulations
CN114880439A (en) * 2022-06-09 2022-08-09 同方知网(北京)技术有限公司 Chinese and foreign language literature unified theme retrieval system
WO2022191395A1 (en) * 2021-03-09 2022-09-15 삼성전자주식회사 Apparatus for processing user command, and operating method therefor
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
CN116662582A (en) * 2023-08-01 2023-08-29 成都信通信息技术有限公司 Specific domain business knowledge retrieval method and retrieval device based on natural language

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018031656A1 (en) * 2016-08-09 2018-02-15 Ripcord, Inc. Systems and methods for contextual retrieval of electronic records
WO2019027696A1 (en) * 2017-08-03 2019-02-07 Motorola Solutions, Inc. Role-based perception filter
WO2022191395A1 (en) * 2021-03-09 2022-09-15 삼성전자주식회사 Apparatus for processing user command, and operating method therefor
CN113239148A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological resource retrieval method based on machine reading understanding
CN114020862A (en) * 2021-11-04 2022-02-08 中国矿业大学 Retrieval type intelligent question-answering system and method for coal mine safety regulations
CN114880439A (en) * 2022-06-09 2022-08-09 同方知网(北京)技术有限公司 Chinese and foreign language literature unified theme retrieval system
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
CN116662582A (en) * 2023-08-01 2023-08-29 成都信通信息技术有限公司 Specific domain business knowledge retrieval method and retrieval device based on natural language

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
唐琳;郭崇慧;陈静锋;: "中文分词技术研究综述", 数据分析与知识发现, no. 1 *
赵璐?;岁波;罗海琼;陈旭;宋晓霞;洪平;: "基于BERT特征的双向LSTM神经网络在中文电子病历输入推荐中的应用", 中国数字医学, no. 04 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688220A (en) * 2023-12-12 2024-03-12 山东浪潮科学研究院有限公司 Multi-mode information retrieval method and system based on large language model
CN117493585A (en) * 2023-12-29 2024-02-02 安徽大学 Data retrieval system based on large language model
CN117493585B (en) * 2023-12-29 2024-03-22 安徽大学 Data retrieval system based on large language model
CN117877737A (en) * 2024-03-12 2024-04-12 北方健康医疗大数据科技有限公司 Method, system and device for constructing primary lung cancer risk prediction model
CN117877737B (en) * 2024-03-12 2024-07-05 北方健康医疗大数据科技有限公司 Method, system and device for constructing primary lung cancer risk prediction model

Also Published As

Publication number Publication date
CN117093729B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
CN117093729B (en) Retrieval method, system and retrieval terminal based on medical scientific research information
CN111209412B (en) Periodical literature knowledge graph construction method for cyclic updating iteration
Cai et al. An encoder-decoder framework translating natural language to database queries
CN104361127B (en) The multilingual quick constructive method of question and answer interface based on domain body and template logic
CN112100356A (en) Knowledge base question-answer entity linking method and system based on similarity
Qin et al. A survey on text-to-sql parsing: Concepts, methods, and future directions
Zhang et al. SG-Net: Syntax guided transformer for language representation
US20080052262A1 (en) Method for personalized named entity recognition
CN111666764B (en) Automatic abstracting method and device based on XLNet
CN110991180A (en) Command identification method based on keywords and Word2Vec
Fuchs Natural language processing for building code interpretation: systematic literature review report
Yan et al. Response selection from unstructured documents for human-computer conversation systems
CN117493379A (en) Natural language-to-SQL interactive generation method based on large language model
CN115437626A (en) OCL statement automatic generation method and device based on natural language
CN113515616A (en) Task driving system based on natural language
CN113779987A (en) Event co-reference disambiguation method and system based on self-attention enhanced semantics
Nabavi et al. Leveraging Natural Language Processing for Automated Information Inquiry from Building Information Models.
CN112183110A (en) Artificial intelligence data application system and application method based on data center
CN115017271B (en) Method and system for intelligently generating RPA flow component block
Revanth et al. Nl2sql: Natural language to sql query translator
Anisha et al. Text to sql query conversion using deep learning: A comparative analysis
CN114417008A (en) Construction engineering field-oriented knowledge graph construction method and system
Shih et al. Improved Rapid Automatic Keyword Extraction for Voice-based Mechanical Arm Control.
Amrani et al. A chain of text-mining to extract information in archaeology
Chandarana et al. Natural Language Sentence to SQL Query Converter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant