CN118070776A - Physical test question duplicate checking method and system - Google Patents
Physical test question duplicate checking method and system Download PDFInfo
- Publication number
- CN118070776A CN118070776A CN202410502558.3A CN202410502558A CN118070776A CN 118070776 A CN118070776 A CN 118070776A CN 202410502558 A CN202410502558 A CN 202410502558A CN 118070776 A CN118070776 A CN 118070776A
- Authority
- CN
- China
- Prior art keywords
- physical test
- test questions
- question
- real time
- questions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012360 testing method Methods 0.000 title claims abstract description 343
- 238000000034 method Methods 0.000 title claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 30
- 238000000053 physical method Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000012552 review Methods 0.000 claims description 5
- 238000009825 accumulation Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention provides a physical test question duplicate checking method and a physical test question duplicate checking system, which relate to the technical field of data processing, and the method comprises the following steps: preprocessing the original physical test questions to generate corresponding target physical test questions; detecting sentence vectors contained in a target physical test question in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test question in real time according to the sentence vectors through a preset HNSW module, and searching for weight in a first dimension; performing splicing processing on the target physical test questions and a preset template of the template to generate corresponding standard physical test questions, and performing second dimension weight checking on the standard physical test questions through a preset LLM model; and processing the duplicate checking results of the first dimension duplicate checking and the second dimension duplicate checking to judge whether the test questions are repeated in real time. The invention can greatly save the time of checking the weight and correspondingly improve the use experience of the user.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a physical test question duplicate checking method and a physical test question duplicate checking system.
Background
Along with the progress of science and technology and the development of the era, the test question database and the education system are popularized in various schools, and the test question database and the education system can effectively improve the teaching efficiency and the examination efficiency of the schools.
The number of the test questions stored in the test question database in each school is increased gradually along with the time, so that the phenomenon of recording repeated test questions possibly occurs in the process of recording new test questions later, and the phenomenon of redundant test questions further occurs along with the accumulation of the repeated test questions, thereby reducing the quality of the test questions in the test question database.
Further, in order to avoid the problem redundancy phenomenon, the prior art develops a corresponding problem searching and repeating system, however, the problem searching and repeating system in the prior art mainly uses a problem text comparison technology, and in the process of comparison, each factor appearing in the problem text needs to be compared one by one, so that longer comparison time is required to be consumed, the problem searching and repeating efficiency is correspondingly reduced, and meanwhile, the use experience of a user is reduced.
Disclosure of Invention
Based on the above, the invention aims to provide a physical test question re-checking method and a physical test question re-checking system, so as to solve the problem that the prior art needs to consume longer comparison time, and the test question re-checking efficiency is reduced.
The first aspect of the embodiment of the invention provides:
a physical examination question duplicate checking method, wherein the method comprises the following steps:
Receiving an original physical test question input by a user in real time, and preprocessing the original physical test question to generate a corresponding target physical test question;
Detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to perform first dimension weight checking on the target physical test questions according to the similar test question data and a cosine similarity algorithm;
Performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, and inputting the standard physical test questions into a preset LLM model to perform second dimension weight checking on the standard physical test questions through the preset LLM model;
and judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
The beneficial effects of the invention are as follows: the original physical test questions input by the user are received in real time and are further converted into target physical test questions which can be conveniently processed and identified in a subsequent mode, based on the target physical test questions, the first dimension weight checking and the second dimension weight checking are further carried out on the current target physical test questions through preset various models and algorithms respectively, so that weight checking processing can be carried out on the current original physical test questions in an all-around mode, elements are not required to be compared one by one in the weight checking process, weight checking time is greatly saved, weight checking efficiency is improved, and user experience is improved.
Further, the step of preprocessing the original physical test question to generate a corresponding target physical test question includes:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Further, the step of matching, by the preset HNSW module, similar test question data corresponding to the target physical test question in real time according to the sentence vector includes:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Further, the step of performing a first dimension repeat check on the target physical test question according to the similar test question data and a cosine similarity algorithm includes:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
Further, the step of performing the second dimension duplicate checking on the standard physical test question through the preset LLM model includes:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking includes:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the first score and the second score includes:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
A second aspect of an embodiment of the present invention proposes:
A physical examination question review system, wherein the system comprises:
The receiving module is used for receiving the original physical test questions input by the user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
The first duplicate checking module is used for detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to check duplicate of the target physical test questions in a first dimension according to the similar test question data and a cosine similarity algorithm;
The second weight checking module is used for performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, inputting the standard physical test questions into a preset LLM model, and performing second dimension weight checking on the standard physical test questions through the preset LLM model;
and the judging module is used for judging whether the original physical test question is a repeated test question or not in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
Further, the receiving module is specifically configured to:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Further, the first weight checking module is specifically configured to:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Further, the first duplicate checking module is further specifically configured to:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
Further, the second weight checking module is specifically configured to:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
Further, the judging module is specifically configured to:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
Further, the judging module is specifically further configured to:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
A third aspect of an embodiment of the present invention proposes:
A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the physical examination question re-examination method as described above when the computer program is executed by the processor.
A fourth aspect of the embodiment of the present invention proposes:
a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the physical examination question re-examination method as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flowchart of a method for searching and repeating physical test questions according to a first embodiment of the present invention;
fig. 2 is a block diagram of a physical test question review system according to a sixth embodiment of the present invention.
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, a method for searching for a physical test question according to a first embodiment of the present invention is shown, and the method for searching for a physical test question according to the present embodiment can greatly save time for searching for a test question, improve efficiency of searching for a test question, and improve user experience.
Specifically, the present embodiment provides:
The physical test question duplicate checking method specifically comprises the following steps:
Step S10, receiving original physical test questions input by a user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
Step S20, detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to perform first dimension duplicate checking on the target physical test questions according to the similar test question data and a cosine similarity algorithm;
step S30, performing splicing processing on the target physical test questions and a preset Prompt template to generate corresponding standard physical test questions, and inputting the standard physical test questions into a preset LLM model to perform second dimension weight checking on the standard physical test questions through the preset LLM model;
And step S40, judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
Specifically, in this embodiment, it should be noted first that, in order to objectively and accurately complete the task of searching and re-processing the physical test questions, a Sentence-BERT model, a HNSW module and a LLM model are preset in the present invention, where a Sentence-BERT (sentence embedding based on pre-training) model is specifically configured, and the model can convert text sentences into corresponding vector representations, so that the model can be used for processing tasks of natural languages such as text classification, similarity calculation and clustering. Further, the HNSW (graph-based ANN algorithm) module can accurately process the generated vector so as to facilitate subsequent judgment, and finally, the LLM (large language) model is a deep learning model trained based on massive text data, and can generate natural language text and can further understand text meaning in depth so as to correspondingly process various natural language tasks. In addition, it should be pointed out that the physical test question repeat checking method is mainly used for repeat checking treatment of physical test questions of junior middle school so as to avoid the phenomenon of redundant test questions. Based on the above, in the practical application process, the server arranged at the background can receive the original physical test questions input by the user, namely the physical test questions originally designed in real time. Further, in order to facilitate the identification and processing of the current original physical test question, the current original physical test question needs to be immediately preprocessed at this time, and a required target physical test question is further generated.
Further, at this time, the current target physical test question can be correspondingly input into the Sentence-BERT model, sentence vectors contained in the current target physical test question are correspondingly detected, based on the sentence vectors, the HNSW module can be called, matching processing is immediately performed on the current sentence vectors through the HNSW module, namely similar test question data corresponding to the current target physical test question is further matched in real time in the existing physical test question database, further, first dimension duplication checking can be performed on the current target physical test question through the similar test question data and a cosine similarity algorithm, and specifically, the first dimension duplication checking mainly performs duplication checking of test question semantics. On the basis, in order to comprehensively complete the duplicate checking treatment of the current target physical test question, the current target physical test question and a preset promt template are spliced together, namely the current target physical test question is converted into a standard physical test question with a standard test question format, based on the standard physical test question, the current standard physical test question is further input into the LLM model, finally, the second dimension duplicate checking is performed on the current standard physical test question through the LLM model, finally, only the duplicate checking results of the current first dimension duplicate checking and the second dimension duplicate checking are compared in real time, and whether the current original physical test question is the duplicate test question can be judged directly according to the comparison result, so that the process of comparing the original physical test question is omitted, the duplicate checking time is greatly saved, the duplicate checking efficiency of the physical test question is improved, and the use experience of a user is improved.
Second embodiment
Further, the step of preprocessing the original physical test question to generate a corresponding target physical test question includes:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Specifically, in this embodiment, it should be noted that after the required original physical test questions are received in real time through the above steps, at this time, the current original physical test questions need to be analyzed first, that is, the test question formats contained in the current original physical test questions are correspondingly identified. Specifically, it is necessary to extract physical test questions with HTML format contained in the current original physical test questions in real time.
Further, the corresponding cleaning processing is immediately performed on the current HTML-format test questions, and the corresponding cleaning is performed to a text form capable of performing natural language processing, that is, the corresponding generation of the target physical test questions capable of performing processing in a natural language manner is performed, so that the subsequent processing is facilitated.
Further, the step of matching, by the preset HNSW module, similar test question data corresponding to the target physical test question in real time according to the sentence vector includes:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Specifically, in this embodiment, it should also be noted that, after the required target physical test question is obtained in real time through the above steps, the required sentence vector is first obtained by processing through the above Sentence-BERT model, and specifically, the sentence vector is a language representation model, and all word vectors corresponding to the words in the sentence can be added up to be used as a representation of the whole sentence. Based on this, the HNSW module is called at this time, a vector retrieval model adapted to the current sentence vector is constructed by the neighbor search graph algorithm stored in the HNSW module, further, the current sentence vector is retrieved and analyzed by the vector retrieval model in real time, and finally, the similar test question data is correspondingly matched in the physical question database, so that the subsequent processing is facilitated.
Third embodiment
Further, the step of performing a first dimension repeat check on the target physical test question according to the similar test question data and a cosine similarity algorithm includes:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
In addition, in this embodiment, it should be noted that, after the required similar test question data is further obtained through the above steps, the current target physical test question may be directly compared with the current similar test question data one by one at this time.
Further, in the process of real-time comparison, an existing cosine similarity algorithm is correspondingly called, the comparison result of the current target physical test question and the current similar test question data is correspondingly scored through the cosine similarity algorithm, whether the semantics of the current target physical test question are repeated or not is judged in real time according to the scored numerical value, so that the semantic dimension check of the current target physical test question can be correspondingly completed, and the subsequent check can be carried out only after the semantic check passes, so that the subsequent processing is facilitated.
Further, the step of performing the second dimension duplicate checking on the standard physical test question through the preset LLM model includes:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
In addition, in this embodiment, it should be further noted that, after the step of determining that the semantics of the target physical test question is not repeated in real time, it is required to immediately perform a second dimension check on the target physical test question at this time, that is, determine whether a plurality of test question factors included in the current target physical test question are repeated in real time. Based on the method, the target physical test question is required to be correspondingly converted into the required standard physical test question, and based on the method, the corresponding labeling processing is further carried out on the current standard physical test question through the LLM model, namely knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the current standard physical test question are labeled in real time, and subsequent duplicate checking is completed so as to facilitate subsequent processing.
Fourth embodiment
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking includes:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
In this embodiment, it should be noted that after the first dimension check and the second dimension check are completed respectively through the steps, two check results may be obtained respectively, where it should be noted that, because importance degrees of the first dimension check and the second dimension check are different, so that the corresponding duty ratios of the current first dimension check and the second dimension check are different, based on this, a first weight is added to the current first dimension check and a second weight is added to the second dimension check, preferably, the first weight is set to thirty percent, the second weight is set to seventy percent, based on this, corresponding calculation processing is performed respectively to obtain a first score of the first dimension check and a second score of the second dimension check, and finally, based on the current first score and the second score, whether the original physical question is a repeated question is determined in real time, so as to facilitate subsequent processing. It should be further noted that the weight checking result of the second dimension weight checking includes knowledge points, physical methods, difficulty, physical quantities, solving types, question types and other dimensions, further, different weights are set for each current dimension respectively, and the sum of the weights of the six dimensions is 1, based on which, a second score corresponding to the weight checking result of the current second dimension weight checking can be finally calculated, so that subsequent processing is facilitated.
Fifth embodiment
Further, the step of determining whether the original physical test question is a repeated test question in real time according to the first score and the second score includes:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
In this embodiment, it should be noted that, after the first score and the second score are further calculated through the above steps, in order to complete accurate determination, it is necessary to immediately perform accumulation processing on the current first score and the second score and generate the corresponding target score.
Further, only the current target score is directly judged to be greater than a preset score threshold, and preferably, the preset score threshold is set to be 70 minutes, wherein the full score is 100 minutes. Based on the above, if the current target score is judged to be larger than the current preset score threshold in real time, the repetition rate of the corresponding description current original physical test questions is too high, namely the repeated test questions are judged, and if not, the repetition rate of the description current original physical test questions is not high, namely the repeated test questions are not judged, so that whether the current target score is the repeated test questions can be objectively and accurately judged, and the use experience of a user is improved.
Referring to fig. 2, a sixth embodiment of the present invention provides:
A physical examination question review system, wherein the system comprises:
The receiving module is used for receiving the original physical test questions input by the user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
The first duplicate checking module is used for detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to check duplicate of the target physical test questions in a first dimension according to the similar test question data and a cosine similarity algorithm;
The second weight checking module is used for performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, inputting the standard physical test questions into a preset LLM model, and performing second dimension weight checking on the standard physical test questions through the preset LLM model;
and the judging module is used for judging whether the original physical test question is a repeated test question or not in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
Further, the receiving module is specifically configured to:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
Further, the first weight checking module is specifically configured to:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
Further, the first duplicate checking module is further specifically configured to:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
Further, the second weight checking module is specifically configured to:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
Further, the judging module is specifically configured to:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
Further, the judging module is specifically further configured to:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
A seventh embodiment of the present invention provides a computer, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the physical examination question duplication method as described above when executing the computer program.
An eighth embodiment of the present invention provides a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the physical examination question duplication method as described above.
In summary, the physical test question duplication checking method and system provided by the embodiment of the invention can greatly save duplication checking time, improve duplication checking efficiency and improve user experience.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (10)
1. The physical test question duplicate checking method is characterized by comprising the following steps of:
Receiving an original physical test question input by a user in real time, and preprocessing the original physical test question to generate a corresponding target physical test question;
Detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to perform first dimension weight checking on the target physical test questions according to the similar test question data and a cosine similarity algorithm;
Performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, and inputting the standard physical test questions into a preset LLM model to perform second dimension weight checking on the standard physical test questions through the preset LLM model;
and judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
2. The physical examination question re-examination method of claim 1, wherein: the step of preprocessing the original physical test questions to generate corresponding target physical test questions comprises the following steps:
when the original physical test questions are obtained in real time, analyzing the original physical test questions to extract HTML format test questions contained in the original physical test questions in real time;
And cleaning the HTML-format test questions to correspondingly generate the target physical test questions.
3. The physical examination question review method of claim 2, wherein: the step of matching similar test question data corresponding to the target physical test question in real time according to the sentence vector through a preset HNSW module comprises the following steps:
When the sentence vector is obtained in real time, calling out a neighbor search graph algorithm contained in the HNSW module;
constructing a corresponding vector retrieval model according to the neighbor search graph algorithm, and correspondingly inputting the sentence vector into the vector retrieval model;
And carrying out search analysis on the sentence vectors through the vector search model so as to match similar test question data corresponding to the target physical test questions in real time.
4. The physical examination question re-examination method of claim 3, wherein: the step of performing a first dimension duplicate checking on the target physical test question according to the similar test question data and a cosine similarity algorithm comprises the following steps:
When the similar test question data are obtained in real time, comparing the target physical test questions with a plurality of similar test questions contained in the similar test question data one by one;
and scoring the comparison result of the target physical test question and the similar test question data through the cosine similarity algorithm so as to correspondingly complete the semantic dimension duplicate checking of the target physical test question.
5. The physical examination question re-examination method of claim 1, wherein: the step of performing second dimension weight checking on the standard physical test question through the preset LLM model comprises the following steps:
when the standard physical test questions are obtained in real time, marking the standard physical test questions through the preset LLM model so as to respectively identify knowledge points, physical methods, difficulties, physical quantities, solving types and question types contained in the standard physical test questions;
and completing second dimension duplicate checking of the standard physical test questions according to the knowledge points, the physical methods, the difficulty, the physical quantity, the solving types and the questions, wherein the standard physical test questions have uniqueness.
6. The physical examination question re-examination method of claim 5, wherein: the step of judging whether the original physical test question is a repeated test question in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking comprises the following steps:
calculating a first score corresponding to the duplicate checking result of the first dimension duplicate checking and a second score corresponding to the duplicate checking result of the second dimension duplicate checking in real time based on a preset rule;
and judging whether the original physical test question is a repeated test question or not in real time according to the first score and the second score.
7. The physical examination question re-examination method of claim 6, wherein: the step of judging whether the original physical test question is a repeated test question according to the first score and the second score in real time comprises the following steps:
The first score and the second score are displayed separately, and whether the first score and the second score are both larger than a preset score threshold value or not is judged respectively;
And if the first score and the second score are both larger than the preset score threshold value in real time, judging the original physical test questions as repeated test questions, and deleting the original physical test questions.
8. A physical examination question review system, the system comprising:
The receiving module is used for receiving the original physical test questions input by the user in real time, and preprocessing the original physical test questions to generate corresponding target physical test questions;
The first duplicate checking module is used for detecting sentence vectors contained in the target physical test questions in real time through a preset Sentence-BERT model, and matching similar test question data corresponding to the target physical test questions in real time according to the sentence vectors through a preset HNSW module so as to check duplicate of the target physical test questions in a first dimension according to the similar test question data and a cosine similarity algorithm;
The second weight checking module is used for performing splicing processing on the target physical test questions and a preset template of the promt to generate corresponding standard physical test questions, inputting the standard physical test questions into a preset LLM model, and performing second dimension weight checking on the standard physical test questions through the preset LLM model;
and the judging module is used for judging whether the original physical test question is a repeated test question or not in real time according to the duplicate checking result of the first dimension duplicate checking and the second dimension duplicate checking.
9. A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the physical examination question duplication method of any one of claims 1 to 7 when the computer program is executed.
10. A readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the physical examination question duplication method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410502558.3A CN118070776A (en) | 2024-04-25 | 2024-04-25 | Physical test question duplicate checking method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410502558.3A CN118070776A (en) | 2024-04-25 | 2024-04-25 | Physical test question duplicate checking method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118070776A true CN118070776A (en) | 2024-05-24 |
Family
ID=91111663
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410502558.3A Pending CN118070776A (en) | 2024-04-25 | 2024-04-25 | Physical test question duplicate checking method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118070776A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051886A (en) * | 2021-03-25 | 2021-06-29 | 科大讯飞股份有限公司 | Test question duplicate checking method and device, storage medium and equipment |
CN114610892A (en) * | 2020-12-09 | 2022-06-10 | 深圳市企鹅网络科技有限公司 | Knowledge point annotation method and device, electronic equipment and computer storage medium |
JP2024006944A (en) * | 2022-06-30 | 2024-01-17 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Semantic retrieval model training method, apparatus, electronic device, and storage medium |
-
2024
- 2024-04-25 CN CN202410502558.3A patent/CN118070776A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114610892A (en) * | 2020-12-09 | 2022-06-10 | 深圳市企鹅网络科技有限公司 | Knowledge point annotation method and device, electronic equipment and computer storage medium |
CN113051886A (en) * | 2021-03-25 | 2021-06-29 | 科大讯飞股份有限公司 | Test question duplicate checking method and device, storage medium and equipment |
JP2024006944A (en) * | 2022-06-30 | 2024-01-17 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Semantic retrieval model training method, apparatus, electronic device, and storage medium |
Non-Patent Citations (1)
Title |
---|
孙鹏辉;邹金霞;韩婧妍;曲家锴;: "基于向量空间模型和Word2vec的试题相似度研究", 信息记录材料, no. 04, 1 April 2020 (2020-04-01) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442718B (en) | Statement processing method and device, server and storage medium | |
CN109670191B (en) | Calibration optimization method and device for machine translation and electronic equipment | |
CN113191148B (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN112163424A (en) | Data labeling method, device, equipment and medium | |
CN111563384A (en) | Evaluation object identification method and device for E-commerce products and storage medium | |
CN112100377B (en) | Text classification method, apparatus, computer device and storage medium | |
CN113962219A (en) | Semantic matching method and system for knowledge retrieval and question answering of power transformer | |
CN111597356A (en) | Intelligent education knowledge map construction system and method | |
CN111177351A (en) | Method, device and system for acquiring natural language expression intention based on rule | |
CN112765974A (en) | Service assisting method, electronic device and readable storage medium | |
CN114647713A (en) | Knowledge graph question-answering method, device and storage medium based on virtual confrontation | |
US10380490B1 (en) | Systems and methods for scoring story narrations | |
CN107844531B (en) | Answer output method and device and computer equipment | |
CN111259115B (en) | Training method and device for content authenticity detection model and computing equipment | |
CN117573985B (en) | Information pushing method and system applied to intelligent online education system | |
CN111125443A (en) | On-line updating method of test question bank based on automatic duplicate removal | |
CN110765241A (en) | Super-outline detection method and device for recommendation questions, electronic equipment and storage medium | |
CN113705207A (en) | Grammar error recognition method and device | |
CN112116181B (en) | Classroom quality model training method, classroom quality evaluation method and classroom quality evaluation device | |
CN114707507B (en) | List information detection method and device based on artificial intelligence algorithm | |
CN116186219A (en) | Man-machine dialogue interaction method, system and storage medium | |
CN118070776A (en) | Physical test question duplicate checking method and system | |
CN115964484A (en) | Legal multi-intention identification method and device based on multi-label classification model | |
CN112989040B (en) | Dialogue text labeling method and device, electronic equipment and storage medium | |
CN112989001B (en) | Question and answer processing method and device, medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |