CN118070776A - Physical test question duplicate checking method and system - Google Patents

Physical test question duplicate checking method and system Download PDF

Info

Publication number
CN118070776A
CN118070776A CN202410502558.3A CN202410502558A CN118070776A CN 118070776 A CN118070776 A CN 118070776A CN 202410502558 A CN202410502558 A CN 202410502558A CN 118070776 A CN118070776 A CN 118070776A
Authority
CN
China
Prior art keywords
physical test
test questions
question
real time
questions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410502558.3A
Other languages
Chinese (zh)
Inventor
刘亚中
谢德刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Wind Vane Intelligent Technology Co ltd
Original Assignee
Jiangxi Wind Vane Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Wind Vane Intelligent Technology Co ltd filed Critical Jiangxi Wind Vane Intelligent Technology Co ltd
Priority to CN202410502558.3A priority Critical patent/CN118070776A/en
Publication of CN118070776A publication Critical patent/CN118070776A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a physical test question duplicate checking method and a physical test question duplicate checking system, which relate to the technical field of data processing, and the method comprises the following steps: preprocessing the original physical test questions to generate corresponding target physical test questions; detecting sentence vectors contained in a target physical test question in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test question in real time according to the sentence vectors through a preset HNSW module, and searching for weight in a first dimension; performing splicing processing on the target physical test questions and a preset template of the template to generate corresponding standard physical test questions, and performing second dimension weight checking on the standard physical test questions through a preset LLM model; and processing the duplicate checking results of the first dimension duplicate checking and the second dimension duplicate checking to judge whether the test questions are repeated in real time. The invention can greatly save the time of checking the weight and correspondingly improve the use experience of the user.

Description

Physical test question duplicate checking method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a physical test question duplicate checking method and a physical test question duplicate checking system.
Background
Along with the progress of science and technology and the development of the era, the test question database and the education system are popularized in various schools, and the test question database and the education system can effectively improve the teaching efficiency and the examination efficiency of the schools.
The number of the test questions stored in the test question database in each school is increased gradually along with the time, so that the phenomenon of recording repeated test questions possibly occurs in the process of recording new test questions later, and the phenomenon of redundant test questions further occurs along with the accumulation of the repeated test questions, thereby reducing the quality of the test questions in the test question database.
Further, in order to avoid the problem redundancy phenomenon, the prior art develops a corresponding problem searching and repeating system, however, the problem searching and repeating system in the prior art mainly uses a problem text comparison technology, and in the process of comparison, each factor appearing in the problem text needs to be compared one by one, so that longer comparison time is required to be consumed, the problem searching and repeating efficiency is correspondingly reduced, and meanwhile, the use experience of a user is reduced.
Disclosure of Invention
Based on the above, the invention aims to provide a physical test question re-checking method and a physical test question re-checking system, so as to solve the problem that the prior art needs to consume longer comparison time, and the test question re-checking efficiency is reduced.
The first aspect of the embodiment of the invention provides:
a physical examination question duplicate checking method, wherein the method comprises the following steps:
Receiving an original physical test question input by a user in real time, and preprocessing the original physical test question to generate a corresponding target physical test question;
Detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to perform first dimension weight checking on the target physical test questions according to the similar test question data and a cosine similarity algorithm;
Performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, and inputting the standard physical test questions into a preset LLM model to perform second dimension weight checking on the standard physical test questions through the preset LLM model;
and judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
The beneficial effects of the invention are as follows: the original physical test questions input by the user are received in real time and are further converted into target physical test questions which can be conveniently processed and identified in a subsequent mode, based on the target physical test questions, the first dimension weight checking and the second dimension weight checking are further carried out on the current target physical test questions through preset various models and algorithms respectively, so that weight checking processing can be carried out on the current original physical test questions in an all-around mode, elements are not required to be compared one by one in the weight checking process, weight checking time is greatly saved, weight checking efficiency is improved, and user experience is improved.
Further, the step of preprocessing the original physical test question to generate a corresponding target physical test question includes:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Further, the step of matching, by the preset HNSW module, similar test question data corresponding to the target physical test question in real time according to the sentence vector includes:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Further, the step of performing a first dimension repeat check on the target physical test question according to the similar test question data and a cosine similarity algorithm includes:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
Further, the step of performing the second dimension duplicate checking on the standard physical test question through the preset LLM model includes:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking includes:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the first score and the second score includes:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
A second aspect of an embodiment of the present invention proposes:
A physical examination question review system, wherein the system comprises:
The receiving module is used for receiving the original physical test questions input by the user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
The first duplicate checking module is used for detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to check duplicate of the target physical test questions in a first dimension according to the similar test question data and a cosine similarity algorithm;
The second weight checking module is used for performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, inputting the standard physical test questions into a preset LLM model, and performing second dimension weight checking on the standard physical test questions through the preset LLM model;
and the judging module is used for judging whether the original physical test question is a repeated test question or not in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
Further, the receiving module is specifically configured to:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Further, the first weight checking module is specifically configured to:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Further, the first duplicate checking module is further specifically configured to:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
Further, the second weight checking module is specifically configured to:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
Further, the judging module is specifically configured to:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
Further, the judging module is specifically further configured to:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
A third aspect of an embodiment of the present invention proposes:
A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the physical examination question re-examination method as described above when the computer program is executed by the processor.
A fourth aspect of the embodiment of the present invention proposes:
a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the physical examination question re-examination method as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flowchart of a method for searching and repeating physical test questions according to a first embodiment of the present invention;
fig. 2 is a block diagram of a physical test question review system according to a sixth embodiment of the present invention.
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, a method for searching for a physical test question according to a first embodiment of the present invention is shown, and the method for searching for a physical test question according to the present embodiment can greatly save time for searching for a test question, improve efficiency of searching for a test question, and improve user experience.
Specifically, the present embodiment provides:
The physical test question duplicate checking method specifically comprises the following steps:
Step S10, receiving original physical test questions input by a user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
Step S20, detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to perform first dimension duplicate checking on the target physical test questions according to the similar test question data and a cosine similarity algorithm;
step S30, performing splicing processing on the target physical test questions and a preset Prompt template to generate corresponding standard physical test questions, and inputting the standard physical test questions into a preset LLM model to perform second dimension weight checking on the standard physical test questions through the preset LLM model;
And step S40, judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
Specifically, in this embodiment, it should be noted first that, in order to objectively and accurately complete the task of searching and re-processing the physical test questions, a Sentence-BERT model, a HNSW module and a LLM model are preset in the present invention, where a Sentence-BERT (sentence embedding based on pre-training) model is specifically configured, and the model can convert text sentences into corresponding vector representations, so that the model can be used for processing tasks of natural languages such as text classification, similarity calculation and clustering. Further, the HNSW (graph-based ANN algorithm) module can accurately process the generated vector so as to facilitate subsequent judgment, and finally, the LLM (large language) model is a deep learning model trained based on massive text data, and can generate natural language text and can further understand text meaning in depth so as to correspondingly process various natural language tasks. In addition, it should be pointed out that the physical test question repeat checking method is mainly used for repeat checking treatment of physical test questions of junior middle school so as to avoid the phenomenon of redundant test questions. Based on the above, in the practical application process, the server arranged at the background can receive the original physical test questions input by the user, namely the physical test questions originally designed in real time. Further, in order to facilitate the identification and processing of the current original physical test question, the current original physical test question needs to be immediately preprocessed at this time, and a required target physical test question is further generated.
Further, at this time, the current target physical test question can be correspondingly input into the Sentence-BERT model, sentence vectors contained in the current target physical test question are correspondingly detected, based on the sentence vectors, the HNSW module can be called, matching processing is immediately performed on the current sentence vectors through the HNSW module, namely similar test question data corresponding to the current target physical test question is further matched in real time in the existing physical test question database, further, first dimension duplication checking can be performed on the current target physical test question through the similar test question data and a cosine similarity algorithm, and specifically, the first dimension duplication checking mainly performs duplication checking of test question semantics. On the basis, in order to comprehensively complete the duplicate checking treatment of the current target physical test question, the current target physical test question and a preset promt template are spliced together, namely the current target physical test question is converted into a standard physical test question with a standard test question format, based on the standard physical test question, the current standard physical test question is further input into the LLM model, finally, the second dimension duplicate checking is performed on the current standard physical test question through the LLM model, finally, only the duplicate checking results of the current first dimension duplicate checking and the second dimension duplicate checking are compared in real time, and whether the current original physical test question is the duplicate test question can be judged directly according to the comparison result, so that the process of comparing the original physical test question is omitted, the duplicate checking time is greatly saved, the duplicate checking efficiency of the physical test question is improved, and the use experience of a user is improved.
Second embodiment
Further, the step of preprocessing the original physical test question to generate a corresponding target physical test question includes:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Specifically, in this embodiment, it should be noted that after the required original physical test questions are received in real time through the above steps, at this time, the current original physical test questions need to be analyzed first, that is, the test question formats contained in the current original physical test questions are correspondingly identified. Specifically, it is necessary to extract physical test questions with HTML format contained in the current original physical test questions in real time.
Further, the corresponding cleaning processing is immediately performed on the current HTML-format test questions, and the corresponding cleaning is performed to a text form capable of performing natural language processing, that is, the corresponding generation of the target physical test questions capable of performing processing in a natural language manner is performed, so that the subsequent processing is facilitated.
Further, the step of matching, by the preset HNSW module, similar test question data corresponding to the target physical test question in real time according to the sentence vector includes:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Specifically, in this embodiment, it should also be noted that, after the required target physical test question is obtained in real time through the above steps, the required sentence vector is first obtained by processing through the above Sentence-BERT model, and specifically, the sentence vector is a language representation model, and all word vectors corresponding to the words in the sentence can be added up to be used as a representation of the whole sentence. Based on this, the HNSW module is called at this time, a vector retrieval model adapted to the current sentence vector is constructed by the neighbor search graph algorithm stored in the HNSW module, further, the current sentence vector is retrieved and analyzed by the vector retrieval model in real time, and finally, the similar test question data is correspondingly matched in the physical question database, so that the subsequent processing is facilitated.
Third embodiment
Further, the step of performing a first dimension repeat check on the target physical test question according to the similar test question data and a cosine similarity algorithm includes:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
In addition, in this embodiment, it should be noted that, after the required similar test question data is further obtained through the above steps, the current target physical test question may be directly compared with the current similar test question data one by one at this time.
Further, in the process of real-time comparison, an existing cosine similarity algorithm is correspondingly called, the comparison result of the current target physical test question and the current similar test question data is correspondingly scored through the cosine similarity algorithm, whether the semantics of the current target physical test question are repeated or not is judged in real time according to the scored numerical value, so that the semantic dimension check of the current target physical test question can be correspondingly completed, and the subsequent check can be carried out only after the semantic check passes, so that the subsequent processing is facilitated.
Further, the step of performing the second dimension duplicate checking on the standard physical test question through the preset LLM model includes:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
In addition, in this embodiment, it should be further noted that, after the step of determining that the semantics of the target physical test question is not repeated in real time, it is required to immediately perform a second dimension check on the target physical test question at this time, that is, determine whether a plurality of test question factors included in the current target physical test question are repeated in real time. Based on the method, the target physical test question is required to be correspondingly converted into the required standard physical test question, and based on the method, the corresponding labeling processing is further carried out on the current standard physical test question through the LLM model, namely knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the current standard physical test question are labeled in real time, and subsequent duplicate checking is completed so as to facilitate subsequent processing.
Fourth embodiment
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking includes:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
In this embodiment, it should be noted that after the first dimension check and the second dimension check are completed respectively through the steps, two check results may be obtained respectively, where it should be noted that, because importance degrees of the first dimension check and the second dimension check are different, so that the corresponding duty ratios of the current first dimension check and the second dimension check are different, based on this, a first weight is added to the current first dimension check and a second weight is added to the second dimension check, preferably, the first weight is set to thirty percent, the second weight is set to seventy percent, based on this, corresponding calculation processing is performed respectively to obtain a first score of the first dimension check and a second score of the second dimension check, and finally, based on the current first score and the second score, whether the original physical question is a repeated question is determined in real time, so as to facilitate subsequent processing. It should be further noted that the weight checking result of the second dimension weight checking includes knowledge points, physical methods, difficulty, physical quantities, solving types, question types and other dimensions, further, different weights are set for each current dimension respectively, and the sum of the weights of the six dimensions is 1, based on which, a second score corresponding to the weight checking result of the current second dimension weight checking can be finally calculated, so that subsequent processing is facilitated.
Fifth embodiment
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the first score and the second score includes:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
In this embodiment, it should be noted that, after the first score and the second score are further calculated through the above steps, in order to complete accurate determination, it is necessary to immediately perform accumulation processing on the current first score and the second score and generate the corresponding target score.
Further, only the current target score is directly judged to be greater than a preset score threshold, and preferably, the preset score threshold is set to be 70 minutes, wherein the full score is 100 minutes. Based on the above, if the current target score is judged to be larger than the current preset score threshold in real time, the repetition rate of the corresponding description current original physical test questions is too high, namely the repeated test questions are judged, and if not, the repetition rate of the description current original physical test questions is not high, namely the repeated test questions are not judged, so that whether the current target score is the repeated test questions can be objectively and accurately judged, and the use experience of a user is improved.
Referring to fig. 2, a sixth embodiment of the present invention provides:
A physical examination question review system, wherein the system comprises:
The receiving module is used for receiving the original physical test questions input by the user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
The first duplicate checking module is used for detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to check duplicate of the target physical test questions in a first dimension according to the similar test question data and a cosine similarity algorithm;
The second weight checking module is used for performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, inputting the standard physical test questions into a preset LLM model, and performing second dimension weight checking on the standard physical test questions through the preset LLM model;
and the judging module is used for judging whether the original physical test question is a repeated test question or not in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
Further, the receiving module is specifically configured to:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Further, the first weight checking module is specifically configured to:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Further, the first duplicate checking module is further specifically configured to:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
Further, the second weight checking module is specifically configured to:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
Further, the judging module is specifically configured to:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
Further, the judging module is specifically further configured to:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
A seventh embodiment of the present invention provides a computer, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the physical examination question duplication method as described above when executing the computer program.
An eighth embodiment of the present invention provides a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the physical examination question duplication method as described above.
In summary, the physical test question duplication checking method and system provided by the embodiment of the invention can greatly save duplication checking time, improve duplication checking efficiency and improve user experience.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. The physical test question duplicate checking method is characterized by comprising the following steps of:
Receiving an original physical test question input by a user in real time, and preprocessing the original physical test question to generate a corresponding target physical test question;
Detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to perform first dimension weight checking on the target physical test questions according to the similar test question data and a cosine similarity algorithm;
Performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, and inputting the standard physical test questions into a preset LLM model to perform second dimension weight checking on the standard physical test questions through the preset LLM model;
and judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
2. The physical examination question re-examination method of claim 1, wherein: the step of preprocessing the original physical test questions to generate corresponding target physical test questions comprises the following steps:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
3. The physical examination question review method of claim 2, wherein: the step of matching similar test question data corresponding to the target physical test question in real time according to the sentence vector through a preset HNSW module comprises the following steps:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
4. The physical examination question re-examination method of claim 3, wherein: the step of performing a first dimension duplicate checking on the target physical test question according to the similar test question data and a cosine similarity algorithm comprises the following steps:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
5. The physical examination question re-examination method of claim 1, wherein: the step of performing second dimension weight checking on the standard physical test question through the preset LLM model comprises the following steps:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
6. The physical examination question re-examination method of claim 5, wherein: the step of judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking comprises the following steps:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
7. The physical examination question re-examination method of claim 6, wherein: the step of judging whether the original physical test question is a repeated test question according to the first score and the second score in real time comprises the following steps:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
8. A physical examination question review system, the system comprising:
The receiving module is used for receiving the original physical test questions input by the user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
The first duplicate checking module is used for detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to check duplicate of the target physical test questions in a first dimension according to the similar test question data and a cosine similarity algorithm;
The second weight checking module is used for performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, inputting the standard physical test questions into a preset LLM model, and performing second dimension weight checking on the standard physical test questions through the preset LLM model;
and the judging module is used for judging whether the original physical test question is a repeated test question or not in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
9. A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the physical examination question duplication method of any one of claims 1 to 7 when the computer program is executed.
10. A readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the physical examination question duplication method of any one of claims 1 to 7.
CN202410502558.3A 2024-04-25 2024-04-25 Physical test question duplicate checking method and system Pending CN118070776A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410502558.3A CN118070776A (en) 2024-04-25 2024-04-25 Physical test question duplicate checking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410502558.3A CN118070776A (en) 2024-04-25 2024-04-25 Physical test question duplicate checking method and system

Publications (1)

Publication Number Publication Date
CN118070776A true CN118070776A (en) 2024-05-24

Family

ID=91111663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410502558.3A Pending CN118070776A (en) 2024-04-25 2024-04-25 Physical test question duplicate checking method and system

Country Status (1)

Country Link
CN (1) CN118070776A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051886A (en) * 2021-03-25 2021-06-29 科大讯飞股份有限公司 Test question duplicate checking method and device, storage medium and equipment
CN114610892A (en) * 2020-12-09 2022-06-10 深圳市企鹅网络科技有限公司 Knowledge point annotation method and device, electronic equipment and computer storage medium
JP2024006944A (en) * 2022-06-30 2024-01-17 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Semantic retrieval model training method, apparatus, electronic device, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610892A (en) * 2020-12-09 2022-06-10 深圳市企鹅网络科技有限公司 Knowledge point annotation method and device, electronic equipment and computer storage medium
CN113051886A (en) * 2021-03-25 2021-06-29 科大讯飞股份有限公司 Test question duplicate checking method and device, storage medium and equipment
JP2024006944A (en) * 2022-06-30 2024-01-17 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Semantic retrieval model training method, apparatus, electronic device, and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙鹏辉;邹金霞;韩婧妍;曲家锴;: "基于向量空间模型和Word2vec的试题相似度研究", 信息记录材料, no. 04, 1 April 2020 (2020-04-01) *

Similar Documents

Publication Publication Date Title
CN110442718B (en) Statement processing method and device, server and storage medium
CN109670191B (en) Calibration optimization method and device for machine translation and electronic equipment
CN113191148B (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN112163424A (en) Data labeling method, device, equipment and medium
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
CN112100377B (en) Text classification method, apparatus, computer device and storage medium
CN113962219A (en) Semantic matching method and system for knowledge retrieval and question answering of power transformer
CN111597356A (en) Intelligent education knowledge map construction system and method
CN111177351A (en) Method, device and system for acquiring natural language expression intention based on rule
CN112765974A (en) Service assisting method, electronic device and readable storage medium
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
US10380490B1 (en) Systems and methods for scoring story narrations
CN107844531B (en) Answer output method and device and computer equipment
CN111259115B (en) Training method and device for content authenticity detection model and computing equipment
CN117573985B (en) Information pushing method and system applied to intelligent online education system
CN111125443A (en) On-line updating method of test question bank based on automatic duplicate removal
CN110765241A (en) Super-outline detection method and device for recommendation questions, electronic equipment and storage medium
CN113705207A (en) Grammar error recognition method and device
CN112116181B (en) Classroom quality model training method, classroom quality evaluation method and classroom quality evaluation device
CN114707507B (en) List information detection method and device based on artificial intelligence algorithm
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN118070776A (en) Physical test question duplicate checking method and system
CN115964484A (en) Legal multi-intention identification method and device based on multi-label classification model
CN112989040B (en) Dialogue text labeling method and device, electronic equipment and storage medium
CN112989001B (en) Question and answer processing method and device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination