CN118095453A

CN118095453A - Self-adaptive evaluation system for cognitive ability

Info

Publication number: CN118095453A
Application number: CN202410327398.3A
Authority: CN
Inventors: 杨阳
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2024-05-28
Also published as: CN118246560A; CN111626420B; CN111626420A

Abstract

The invention provides a self-adaptive evaluation method of cognitive ability, which is called a hierarchical self-adaptive optimized project reaction theoretical model, and if the tested ability is in the initial test, the initial priori distribution of the tested ability is given an initial value by a Bayesian hierarchical model; if the test is completed on n-1 tested capacities, and when the nth tested capacity is subjected to the (t-1) test and the nth tested capacity is required to be subjected to the (t-1) test, the tested capacity prior distribution is assigned by posterior distribution obtained after the (t-1) test; solving candidate designs of the maximum value of the mutual information utility of the expected utility function, and performing test on the tested capacity by using the corresponding test questions; calculating the distribution value of each unknown parameter in the IRT model; obtaining an observation result; calculating posterior distribution of tested capacity; judging whether the current total information quantity meets a termination condition, ending the test if the current information quantity is larger than a threshold condition, and obtaining an observation result; and if the current information quantity does not meet the termination condition, taking the posterior distribution of the tested characteristics as the prior distribution to be tested in the next question test, and continuing to circulate.

Description

Self-adaptive evaluation system for cognitive ability

Technical Field

The invention relates to the technical field of psychological measurement and computer technology, and its theoretical architecture belongs to the fields of cognitive science, psychology and linguistics, and in addition, its functional expression form relates to the technical field of game animation.

Background

1. Theoretical background

Cognitive ability refers to the ability of the human brain to process, store and extract information, i.e., what we generally speak of intelligence, such as, for example, observation, memory, imagination, etc. People recognize the objective world and acquire a variety of knowledge, mainly depending on the cognitive ability of the person. Cognitive ability is also known as "cognitive ability". Refers to the ability to learn, study, understand, generalize, analyze. From an information processing standpoint, i.e., the ability to accept, process, store, and apply information. Fresnel proposed three cognitive abilities in its classification of learning results: speech information, intelligent skills, and cognitive strategies.

Cognitive ability is the most important psychological condition for a person to successfully complete an activity. The abilities of perception, language, memory, attention, thinking and imagination are all considered cognitive abilities. Cognitive ability is the ability of the brain to process, store and extract information, i.e., the ability of people to grasp the composition, performance, relationship with other things, development power, development direction and basic laws. The cognitive character of people has a significant impact on socioeconomic performance and enhanced cognitive ability has also been found to be associated with increased wealth and life expectancy.

Cognitive ability testing is a test that measures a person's ability to learn and complete a task. Such tests are particularly suited for use in selecting a group of inexperienced candidates, and the capabilities associated with a job may be categorized as reading, computing, perception speed, spatial and reasoning.

It is generally believed that a complete cognitive ability assessment system must be built upon a mature frame of cognitive theory. However, the theoretical framework of the current domestic cognitive ability evaluation is not perfect, the existing evaluation tool is too simple, and the evaluation result and the real cognitive ability of the tested person cannot be calibrated in the same scale. It is worth noting that the evaluation system can be widely used for evaluating various cognitive abilities, and the invention is described in detail by taking the reading ability in the cognitive abilities as a case:

Language and reading ability are important components and indexes for promoting human brain development and cognitive development, and have extremely important roles in human evolution and individual development processes. The reading ability plays a key role in the development of imagination, concentration, reasoning ability, thought planning and thought smoothness, and has a remarkable positive correlation with the learning ability and learning performance in the school age stage. Therefore, scientific assessment of the ability of children to read is an important way to master the development of children's ability in all aspects.

Congenital genetic factors, cognitive development, language environment, home environment and the like all affect human language development and thus affect their related cognitive and social abilities. The differentiation of their capabilities requires scientific measurement tools to assess and perform effective interventions to ensure healthy development of human language and cognition.

Overseas, the European Committee, the European common reference frame of language, and the like, which are commonly developed by forty-more member countries, provide a guiding frame and an assessment tool for language teaching, learning and reading ability assessment. In addition, english reading test assess work has American Council on the Teaching of Foreign Languages(ACTFL)：the Oral Proficiency Interview(OPI)、Writing Proficiency Test(WPT)、Advanced Placement(AP)Chinese Language and Culture Test,and the SAT IIChinese Subject Test(Liu,2017). existing mainstream Chinese language test assess work has "Chinese level examination China's Hanyu Shuiping Kaoshi, HSK (HSK; teng, 2017), etc.

At present, clear relations among other functions such as perception, pronunciation, word shape, cognitive control and the like in the evaluation tool are lacked; secondly, the coverage and sub-functions of language functions are not clear, and furthermore, the reading ability should also relate to the processes of visual ability, statistical learning ability, working memory, cognitive control, attention, etc., and include input and output modules specific to non-language. Thus, there is still a lack of an objective, systematic and comprehensive measurement tool today to reflect the true level of reading ability development.

2. Background of evaluation technique

At present, domestic measurement technology is in a relatively lagged state, and most measurement tools still use the traditional classical measurement Theory (CLASSICAL TEST Theory, CTT). The core content of the method comprises the concepts of true score, credibility, validity and the like. Based on pearson statistical theory (Person statistics), the focus of attention is the general behavior that is tested in the test, whose parallel test theory is logically apparent in a vulnerability. The tested person needs to answer all the questions once each time to obtain the evaluation result, so that the tested person is forced to do all the questions which do not contribute to the capability evaluation of the tested person, the evaluation time is too long, and the efficiency is low. The whole test has only one credibility index, and the question weights with different degrees of difficulty and distinction cannot be reflected, so that the capability gap between tested objects is unrealistic. And the performance of the test on each particular item is not sufficiently appreciated. And in terms of robustness, the test depends on the evaluation to be tested and the specific question constitution of the test. All the test papers are required to be tested at the same time for comparison. The method has the defects of low risk of leakage and low flexibility, and the performance index estimation of tests and projects depends on specific tested specimens. The same test is applied to different tested samples, and the obtained performance indexes are different. It follows that classical measurement theory has a number of significant limitations in measurement applications, and that the present invention is directed to the above limitations and to the innovations in the method, and the background of the new technology used by the present test system will be described in detail below:

2.1. Computerized self-adapting test (computerised ADAPTIVE TESING, CAT)

The computerized self-adaptive test is in the form that a computer automatically selects proper questions from a question library to answer, and after each question is answered, the computer automatically evaluates the current capacity level of the examinee and then selects the questions which are most suitable for the current capacity level of the examinee to continue the test until a certain evaluation stopping standard is reached. Currently, there are many well-known tests internationally that use forms of computerized adaptive tests, such as GRE, ASVAT, GMAT, etc. In the future, CAT must find wider application.

The traditional CAT test is established by the following steps:

2.1.1 question bank construction

The general CAT method firstly needs to collect a large number of test questions about measurement capability and a large number of testees distributed in a wider capability value range, test the test questions, and fit the test questions by using different statistical theories, such as the project reaction theory, and estimate various parameters of each test question, such as difficulty parameters, distinguishing parameters, guessing parameters and the like. When the question bank is built, various parameters of the questions in the question bank are fixed. The subject in the formal test is adaptively tested by using the basis question bank and the fixed parameters, and the question selection standard and the convergence standard are also referred to by the fixed parameters.

2.1.2 Topic selection strategy

The information quantity is a judging strategy and a main index used in the process of selecting the next proper question according to the answer information to be tested in real time in the self-adaptive test. The conventional CAT is to select and test the questions which can provide the most information gain in the current test answer mode so as to achieve the purpose of quickly converging the test process to an optimal value/a local optimal value. However, the disadvantage is that (1) the question with the greatest information gain is repeatedly selected, so that most of the questions in the question bank cannot be selected, and the question bank is leaked. (2) Under the condition that the prior distribution of the tested capability is not clear or the answer data is too little to determine the actual capability of the tested, the measurement result deviates from the actual value due to rapid convergence. (3) The absence of other auxiliary parameters to help correct the fitting direction can easily lead to deviations of the measurement from the true value.

2.1.3 Evaluation of the subject Property level

The tested trait level is re-estimated each time a test question is completed, and the following method is commonly used:

Conditional maximum likelihood estimation: the method is most used, but the answer of the tested person is completely wrong or completely correct, so the method cannot be used for testing the tested person with abnormal answer accuracy at the beginning.

Bayesian expectation posterior distribution estimation: less time is spent, but it is difficult to select a suitable a priori distribution.

The optimal solution for the tested level estimation is not fully proposed in the conventional CAT.

2.1.4 Suspension rules

The stopping rule is a criterion for ending the evaluation, and if the information amount described in step 2.1.3 is used as an index, a threshold information amount (eg. information amount > 25) is defined to stop the test.

2.2 Theory of project reaction (Item Response Theory, IRT):

All parameters of the test questions in the question bank and the capacity value evaluation parameters to be tested are based on a project response theory (Item Response Theory, IRT) (also called a potential feature theory or a potential feature model), so that the evaluation efficiency and the robustness are greatly improved, and the evaluation result has a reference value.

The project response theory is a modern psychological measurement theory, and has the meaning of guiding project screening and test compiling. The theory of project response assumes that a "potential trait" is tested, which is a statistical idea that is presented on the basis of observing analytical test responses, where a potential trait generally refers to potential capabilities and often uses the total score of a test as an estimate of such potential. The project parameters established by the project reaction theory have the characteristic of permanence, which means that the scores of different measurement scales can be unified. The superior statistical properties can make up for the great defects and deficiencies of classical measurement Theory (CLASSICAL TEST Theory, CTT) commonly used in the past in statistics and evaluation to a great extent.

The basic concept of the project reaction theory is as follows:

How the results of the project response are affected by the combination of the tested capability level and the project characteristic parameters is described in the form of a probability function. Unlike conventional statistical methods of testing, project response theory can obtain the metrology parameters for each test question, as well as the level of response capability parameters for each test question (all of which are included in the project profile). These parameters can help the test question designer to observe the test question difficulty and the tested ability in the same reference system from the perspective of metrology.

Estimating a target: ability level and potential psychological trait level (LATENT TRAITS)

Project characteristic curve (ITEM CHARACTERISTIC cut, ICC): p _i(θ)＝P(β_i,α_i,c_i,θ_i)

2.3 Adaptive design optimization

Measurement accuracy is important in the field of psychological or behavioral measurement in order to ensure proper model reasoning later. Second, when the observation is expensive or the experimental procedure is very time consuming, efficient measurement techniques are also of paramount importance. The objective of the research of adaptive design optimization methods, which exploit the sequencing of experiments to attempt to obtain as much information as possible from the data of the entire test session, is to try to guarantee both measurement accuracy and test efficiency, so that the traditional lengthy fixed designs are abandoned, and instead the methods of actively collecting the data for best inference. (Lindley, 1956; chernoff,1959; kiefer,1959; box & Hill, 1967). Since in most cases the data collection is done sequentially, the best design is to plan the next measurement scheme based on immediate feedback for each data point, with the choice of each new measurement being determined using information obtained from previous measurements in order to obtain the maximum gain of information about the process and behavior under study. Along with the improvement of computer computing power, the self-adaptive design optimization is used for amplifying the wonderful colors in various fields of cognitive neuroscience, psychology, statistics, education and the like.

The adaptive design optimization is a bayesian sequence optimization algorithm executed in the experimental process, specifically, in each experiment, according to the current situation (prior distribution) of the knowledge of the phenomenon under study, the highest expected value (defined as follows) of the utility function is determined by the optimal design through the statistical model representation of data, then the experiment is performed by the optimal design, namely, the test question with larger information gain, and the measurement result is observed and recorded. These observations will then be used to derive a posterior distribution, which will be the prior distribution of the next measurement, which is repeated iteratively throughout the test process, with the steps of alternately optimizing the design, measuring and updating the single-stage data model until the appropriate stopping criteria are met, and finally obtaining the trait level estimate under test.

In the self-adaptive design optimization, the highest expected value of the utility function is solved by combining the prior distribution as the most critical step, which determines which test question should be selected according to the characteristics of the current test, so that the test can obtain the most information, and meanwhile, the invalid test questions deviating from the test capability too far are prevented from reducing the test efficiency. The formula is as follows:

Where θ is a parameter of a data model (or measurement model) that predicts the observed data for a given parameter. y ^(1:t) is the sum of the past set of measurements, y ^(1:t-1), from the first trial to the t-1 trial and the output of the current trial, y ^(t). d _t is a candidate design, i.e., a candidate test question. p (θ|y ^(1:t-1)) is the posterior distribution of the past t-1 tests, and is also the prior distribution of the current test. p (y ^(t)|θ,d_t) is the conditional data distribution of the result y ^(t) under the current trial a priori trait θ and the candidate design d _t. The utility of candidate design d _t is measured as a sample utility function, as a conditional distribution of θ. U (d _t) is a desired utility function, representing the sample utility function's desire for data distribution and a priori function. The maximum value d _t ^* of U (d _t) is the maximum information gain of the relevant model parameters when the measurement result is observed. The measurement result y ^(t) can be obtained after the optimal design is implemented, the posterior distribution p (theta|y ^(1:t)) is obtained, and the posterior distribution p (theta|y ^(1:t-1))＝p(θ|y^(1:t)) is used as the prior distribution to participate in the optimal design at the beginning of the next test question. It is worth mentioning that for adaptive design optimization, the test may be performed without starting the test or during the test.

But the adaptive design optimization only optimizes the measurement process at a single test level, rather than using information obtained from data collected from all test trials in the past.

2.4 Bayesian hierarchical model

Hierarchical Bayesian modeling is another approach to improving inference efficiency and accuracy (Gelman, carlin, stern, & Rubin,2004; jordan,1998; koller & Friedman,2009; rouder & Lu, 2005) that strives to determine the structure of a data generating population (e.g., the type of population to which an individual belongs) in order to infer properties of an individual from provided measurements, with the motivation that datasets may contain information about each other, if not generated from the same individual. Hierarchical modeling provides a statistical framework for exploiting this mutual information capability.

The bayesian hierarchical model not only provides a flexible framework to incorporate this previously tested information, but is also well suited to incorporate existing bayesian adaptive design optimization paradigms to achieve higher measurement efficiencies.

The basic idea behind the bayesian hierarchical model is to exploit statistical dependencies present in the data to improve the accuracy of reasoning (e.g. statistical effectiveness of the test). Given that past trials are all random variables from different populations, measurements made by new individuals drawn from the same population are likely to be similar to others. In this case, rather than starting from the absence of such information, adaptive reasoning would gain greater benefits when considering a particular population data structure. That is, the data sets contain information about each other as a set, resulting in a more accurate inference. Since a single dataset needs to model itself (i.e. a measurement model), the statistical relationships between them need to be modeled on a single level, and thus the models are hierarchical.

3. Evaluation form background

The system of the invention considers that the mind and the attention of the children are not fully mature, and accidents such as false selection, blind selection, distraction and the like easily occur in the traditional lengthy evaluation process, so that the evaluation frame of the system is easy to be combined with game script logic, the evaluation process increases the interest of answering questions, can help the attention of the user to be focused on the evaluation items, and the evaluation is completed more intensively, so that the result is closer to the true capability level.

Defects of the traditional evaluation system:

Theoretical aspects:

(1) At present, the true level of Chinese reading capability of students is not clearly known, the functions of the existing evaluation tools are very limited, and the existing evaluation tools are aimed at partial text reading and writing capability, the content is thin, and the frame is fuzzy. The theory of psychology and cognitive science is not connected for comprehensive integration. There is still a lack of an objective, systematic and comprehensive measuring tool today that reflects the true level of child's reading ability development.

(2) A key "window of opportunity" for identifying the development of children's language and related cognitive functions in the context of chinese social culture, learning trigger mechanism, development track and potential influencing factors (e.g. Kuhl, 2011) are needed;

(3) In combination with the assessment data and brain function measurements, a more scientific and normative screening regimen is formulated to intervene or prevent infants and young children who may be in language and related developmental disorders as early as possible. (Gabrieli, 2009)

The technical aspect of evaluation:

Technical or application defects of a traditional evaluation tool with classical measurement theory as a core are overcome:

(1) The question setting strategy cannot be adjusted according to the capability of a question answering person, the evaluation result is obtained after all questions are answered at one time, the measurement time is too long, and the efficiency is low.

(2) The weights of different difficulties and different distinguishing topics cannot be reflected, so the capability gap among the tested objects is not true.

(3) The test depends on the evaluation to be tested and the specific subject constitution of the test. All the test papers are required to be tested at the same time for comparison. Has the defect of low risk and flexibility of leakage.

(4) The performance index estimates for tests and projects depend on the specific sample being tested. The same test is applied to different tested samples, and the obtained performance indexes are different.

(5) The frame of reference for the ability to be tested and the difficulty of the project is different.

Aiming at the technical defects of a CAT system taking a traditional IRT model as a core:

(6) A large amount of time and cost are required to be spent in the early stage, and a large amount of samples are collected for building the question bank so as to estimate the parameters of all items in the question bank.

(7) The system expansibility is poor, once the project parameters are estimated, the project parameters are fixed in the following test, and if the following question bank needs to be updated or the tested crowd changes, the question bank construction step needs to be carried out again.

(8) The question selection strategy based on the information gain alone may make most of questions in the question bank unable to be selected, resulting in repeated measurement of few questions and risk of leakage of the question bank.

(9) If the tested questions are carelessly answered in the question selection strategy based on the information gain, the accuracy of the test result is greatly affected under the condition that no additional parameter constraint exists.

(10) When the IRT model estimates parameters, the test result may converge to a local optimal value due to the fact that proper prior distribution cannot be given in the initial stage of the test, and therefore accuracy of the model is affected.

Evaluation form aspect:

(1) The traditional evaluation tool has the problems of boring and rigid presentation form, lack of novelty, easy distraction of children, possibility of wrong selection, blind selection, distraction and the like, and failure of the evaluation result to reflect the true capability level.

Disclosure of Invention

1. Detailed description of the invention:

the invention provides a self-adaptive evaluation method of cognitive ability, which is called a hierarchical self-adaptive optimized project reaction theoretical model, and comprises the following steps:

step 1, if the tested capacity is in the initial test, giving an initial value to the initial priori distribution by a Bayesian hierarchical model; if the test is completed on n-1 tested capacities, and when the nth tested capacity is tested for (t-1) times and the nth tested capacity is required to be tested for the nth time, the tested capacity prior distribution is assigned by posterior distribution obtained after the (t-1) times of test;

Step 2, solving candidate designs of the maximum value of the mutual information utility of the expected utility function, and using corresponding test questions to test the tested capacity;

step 3, calculating each parameter in the IRT model: the estimation method comprises the following steps: joint likelihood estimation, marginal maximum likelihood estimation (also known as EM algorithm), bayesian expectation posterior.

The expected posterior estimation value is calculated by using a Bayes expected posterior estimation three-parameter IRT model, and posterior distribution of unknown parameters is constructed;

Step 4, obtaining an observation result;

Step 5, calculating posterior distribution of the tested capacity;

Step 6, judging whether the current total information quantity meets the termination condition, if the current information quantity is greater than the threshold condition, ending the test, and obtaining an observation result; if the current information quantity does not meet the termination condition, returning to the step 1, and taking the posterior distribution of the tested characteristics as the prior distribution to be tested in the next question test, and continuing to circulate;

And 7, when one tested object ends the test, updating the Bayesian hierarchical model in the tested object space, and giving a proper initial value at the beginning of the next tested object, wherein the updating expression is as follows:

p(θ_n+1|y_1:n)＝p(θ_n|y_1:n-1)；

Wherein,

Where p (η) is a priori distribution of the higher-level model parameters η, the marginal distribution p (y _1:n) is obtained by integrating the subsequent expressions over θ _1:n and η.

In order to avoid the process of re-modeling an IRT model in the expansion of a question bank, the invention further carries out parameterization modeling on each item in a test system, records the corresponding parameter of each item and the IRT model thereof in the question bank, and carries out correlation modeling on the item and the linguistic and semantic parameters in a corpus. The deduction formulas of the language, semantic parameters and parameters in the IRT model of the project are as follows:

y(β_t,α_t,c_t)＝f(t,l_t,s_t...)；

Where the strain quantity (. Beta _t,α_t,c_t) is the parameter of the item t in the IRT model, l _t,s_t is the parameter of the item in the corpus, and f () is the correlation method.

In the step 2 of the present invention, the expression of the expected utility function is:

d_t ^*＝argmaxU(d_t)；

Where y _1:n-1 represents all observations of past n-1 tested capabilities, y _n ^(1:t) contains past t-1 test assessments of the current nth tested capability y _n ^(1:t-1) and current candidate observations y _n ^(t).

In the step3 of the present invention, if the prior distribution of the test sample data and the tested feature is determined, and the prior distribution density function of the item parameter is assumed to be g (ζ), the posterior distribution of the item parameter of the unknown IRT model is:

Wherein,

Wherein P _t (ζ) is a three-parameter IRT model

Wherein L (ζ) is a likelihood function of the project parameter;

Wherein, A marginal probability representing a certain reaction pattern u _n;

Finally, item parameter E (ζ _n|u_n,θ_n) is determined and updated to the latest IRT parameter.

In step 5 of the present invention, the posterior distribution of the tested feature is calculated using the framework of the bayesian hierarchical model, and the expression is:

Wherein,

Wherein y _n ^(1:t)＝y_n ^(t-1)+y_n ^(t), wherein y _n ^(t) yields y _n ^(t)＝P(u_n in step 4);

The method combines the evaluation content and outputs the game as a main form.

Based on the above method, the invention also provides a self-adaptive cognitive ability evaluation system, which comprises:

the input unit is used for receiving personal information and answer data which are input by a tested person; the personal information and answer data comprise: the tested age, personal information, and answer data input through a mouse, a keyboard or a microphone;

the evaluation software downloader is used for downloading the latest version of software and keeping the version consistency of the test software;

a universal serial data bus for asynchronous communication;

a data collector for establishing connection and communication between the computer and an external device;

the storage unit is used for storing the question library of all development stages and evaluation schemes so as to enable the system to select questions in a self-adaptive mode according to the answer information to be tested;

The communication unit is used for communicating with other personal computers and handheld equipment terminals in a wireless mode;

The evaluation processor is an operation unit, which stores a main statistical calculation model of the system and executes a main algorithm of the computer self-adaptive test; the arithmetic unit comprises: setting prior parameters of the item reaction theoretical characteristic function in the test according to the tested personal information; the operation unit gives a first test question on the basis of the prior parameter set at the beginning of the test, modifies the parameter value according to the received input, and determines the next test question according to the modified parameter value;

the calculation memory is used for calculating parameters under the current evaluation progress, wherein the parameters comprise project characteristic parameters, tested capacity parameters and global average information quantity;

The evaluation data storage is used for storing various parameters and answer data under the current evaluation progress;

and the output unit comprises a liquid crystal display screen for displaying a picture of the game test.

The using method of the evaluation system comprises the following steps: firstly, after the system is connected with a power supply and a system starter is started, the evaluation software downloader starts up to work to check whether the current system version is consistent with the content in the remote server, if not, the latest version is downloaded to cover the old version, and if so, the testing main program is operated. The input unit receives various information input by a tested and transmits the data stream to the evaluation processor. The evaluation processor comprises the whole mathematical logic of the project reaction theoretical model of the self-adaptive optimization of the hierarchy and is responsible for the specific calculation of the current progress data in the asynchronous scheduling evaluation data memory and the calculation memory. And the evaluation processor simultaneously transmits the evaluation process to the output unit in real time, so that a logic interface of the evaluation or game evaluation corresponding to the process is displayed in the image display. The evaluation data storage is interconnected with the interface of the data collector, stores all the question library contents and all the parameters and response data under the history evaluation progress, and if modification or batch extraction is needed, the data flow is exported through the data collector.

The invention aims at the defects of three aspects of evaluation theory, evaluation technology and evaluation technology in the traditional capability evaluation system in the background technology. The invention is carried out:

In the aspect of evaluation theory, the method combines the authoritative theory related to the cognitive neuroscience and the development education to carry out theoretical modeling on the cognitive ability evaluation so as to guide the selection of a statistical technology model.

Taking reading ability as an example, other cognitive skills will still establish assessment tasks through similar authoritative theories:

The language evaluation is constructed according to the education theory of cognitive neuroscience and development, and is divided into pre-reading and writing capability (comprising rapid naming task, radical stroke cognition and pinyin cognition task) and reading and writing capability (comprising word reading, orthographic cognition, semantic cognition and morpheme cognition), and in addition, the concept of 'psychological dictionary' is introduced, and the vocabulary of children is tested.

The invention develops a relevant evaluation task prototype according to the theory as shown in the following table, and is the evaluation tool with the system completeness at present. Therefore, the invention complements the problems 1-5 in the aspect of the evaluation theory.

In addition, the evaluation system of the invention can still develop and model groups including but not limited to the elderly, dyskinesia, learning disabilities and the like according to the related authority theory.

In the aspect of the evaluation technology, an innovative composite model is provided to solve the lag of the existing evaluation technology, and is called a hierarchical self-adaptive optimization project response model, the model references the framework of the traditional computerized self-adaptive test, and innovatively combines the self-adaptive design optimization, the Bayesian hierarchical model and the project response theory, wherein the Bayesian hierarchical model is mainly established in the tested individual space to provide prior distribution of tested capability, the self-adaptive design optimization model is mainly established in the parameter space to select an optimal test strategy, and the project response model is characterized by project characteristics with finer granularity, so that the project characteristics are related to the tested capability. The three are all taken together. The excellent statistical performance of the system enables the evaluation system to be more efficient, the evaluation result to be more accurate and the evaluation environment to be less limited. The robustness and generalization capability of the method enable the assessment tool to have higher credibility and feasibility. The test process is a new test for estimating the tested capacity by automatically selecting test questions by a computer according to the tested capacity level. The computer self-adaptive test is different from general computerized test, and the computer not only presents questions, inputs answers, automatically scores and obtains results in the test process, but also gives a priori capability pre-estimated value to the current test according to the answer information of the past test, selects proper questions, and in the test, according to different answers to the test questions, it can automatically select the most proper test questions according to the current accumulated information amount, and finally achieves the most proper characteristic level estimation to the tested capability. Furthermore, each topic (i.e., a so-called project) in each system is modeled with parameters to examine the contribution of the topic to the ability to distinguish the test and its correlation with the theoretical background, which helps the present invention to better screen the topic and understand the status of the ability to be tested.

As shown in fig. 3, in the model construction of the evaluation system, the computer adaptive test system improves the traditional framework of the computer adaptive test CAT, combines two reasoning methods of adaptive design optimization and bayesian hierarchical model, and seeks to fully utilize two past and future data of the tested individual space and the parameter space. Because both can be represented under the Bayesian statistical framework, the present invention can achieve greater information gain by combining them very naturally. Firstly, combining a Bayesian hierarchical model and self-adaptive design optimization, researchers can obtain priori knowledge of the group before the current test is not started, and the hierarchical iterative architecture of the invention can save the process of establishing a question bank by collecting a large amount of samples in the traditional CAT system. The invention provides a more effective design scheme, the combination of the two methods is called hierarchical self-adaptive design optimization, and a basic component of the hierarchical self-adaptive design optimization is established.

Furthermore, in the hierarchical self-adaptive design optimization, the invention creatively uses the project response theory to model, so that the credibility parameter is more refined, the capability gap between the tested items tends to be constant, the project parameter and the calibration of the tested capability are in the same scale, the problems 1-5 in the aspect of the evaluation technology in the fourth part are solved, and moreover, as the hierarchical self-adaptive design optimization can provide reasonable priori knowledge, the short plates of the project response model are complemented, the convergence direction of the project response function is closer to the true value, and the estimation accuracy is obviously improved. The hierarchical self-adaptive scheme of the hierarchical self-adaptive design optimization model saves the time cost of the previous large-scale pre-experiment for compiling the question bank and marking the fixed test question parameters, and when the invention adds new questions into the question bank in the future, the project reaction model does not need to be modeled again, and the parameter estimation of all the test questions can be completed in the future actual test, thereby solving the problems 6-10 in the aspect of the evaluation technology.

In terms of the evaluation form, the evaluation frame of the test has good generalization capability, and various game scripts can be logically established, so that the evaluation of the game form can focus attention of a tested person to make the tested person serious, and the effect of the game form is that the game form can be developed on special groups facing children, reading disorder patients, old people and the like, thereby solving the difficulty of the evaluation form in the above.

2. And (3) improving a concrete calculation model:

The algorithm model is called a hierarchical self-adaptive optimized project response theoretical model, wherein the Bayesian hierarchical model is mainly established in the tested individual space to provide prior distribution of tested capacity, the self-adaptive design optimization model is mainly established in the parameter space to select an optimal test strategy, and the project response model is characterized by project characteristics with finer granularity, so that the project characteristics are associated with the tested capacity. The three are all the same in length. The invention combines the innovativeness of the method to improve the efficiency, the reasoning correctness and the accuracy of model reasoning.

Bayesian hierarchical models in hierarchical adaptively optimized project response models:

Suppose a separate measurement model is given as probability density or quality function p (y _i|θ_i), and the capability parameter θ _i for individual i. The correlation between individuals is expressed as an upper model p (θ _1:n |η), and a regression model using η as a coefficient. θ _1:n＝(θ₁,...,θ_n) is the set of model parameters for all n individuals. The joint posterior distribution of all observed hierarchical models is then given:

Then, assuming that the n+1th test is starting the test, the posterior distribution for this individual for the available data for all previous test results is:

p(θ_n+1|y_1:n)＝p(θ_n|y_1:n-1)；

Wherein,

The improvement points are as follows:

1. according to the historical test information, an iteration mechanism is established, all parameters are updated after each question and each tested test are completed, and the step of establishing a question bank for the test system, which is required by the traditional CAT system and is used for collecting a large amount of sample data in the early stage, is omitted.

2. The prior distribution of tested capacity is provided for self-adaptive design optimization and project response models, so that the convergence direction is more approximate to a true value.

3. The prior distribution can give the capacity expectation of the current test, if the result of the test answer deviates from the capacity expectation, if the number of words recognized by the third-grade students is not as large as that of preschool children, the problem can be found in time and abnormal conditions can be found by the main testers.

An adaptive design optimization model in a hierarchical adaptive optimized project reaction model:

And solving a candidate design d _t ^* which enables the mutual information utility of the expected utility function to be maximum, and using the corresponding test questions to test the tested object.

The expression for the desired utility function is:

d_t ^*＝argmaxU(d_t)

Where y _1:n-1 represents all observations of the past n-1 trials, and y _n ^(1:t) contains the current nth past t-1 test evaluation y _n ^(1:t-1) and the current candidate observation y _n ^(t).

The improvement points are as follows: the utility equation herein adds previously observed data derived in the bayesian hierarchical model, compared to the utility equation in the background section, and calculates joint probabilities together with the data of the current measurement session, combining adaptive design optimization and bayesian hierarchical derivation, so that the two models are combined.

Item reaction model in hierarchical adaptive optimized item reaction model:

As shown in FIG. 1, the system of the invention adopts a three-parameter model in the project reaction theory, and can contain comprehensive project characteristic details.

It comprises the following parameters:

(1) Tested capability parameter θ: representing the capacity level parameter being tested.

(2) Project difficulty parameter beta: also known as a "location parameter" (location parameter), which corresponds to the capability value point with a probability of 0.5 for correctly answering the item, i.e. the location on the capability ruler where the ICC intermediate point falls. The larger the value the more difficult the item is, the more difficult the value the reactive ICC is located on the capacity ruler. It means that the ability value point with a probability of 0.5 for correctly answering the item is located.

(3) Project discrimination parameter α: the term also called scale parameter (SCALE PARAMETER) represents the ability (power) of the item to clearly distinguish between different ability levels being tested near the inflection point. The larger the value, the more discriminatory the item against different levels of tested ability.

(4) Guess parameter c: also called progressive parameter (asymptotic parameter) which reflects the progressive correct answer probability as the ability level goes to minus infinity. Representing the possibility that the tested person only guesses the answer to the item, the larger the value is, the easier the tested person can answer to the item no matter the tested person is. The smaller the value, the less likely the answer to the item.

(5) P _tn (ζ): the probability of the nth test answer to the t-th question is represented in the case that the item parameter is xi.

The estimation process comprises the following steps:

1. The estimated value of each project parameter/the estimated value of the capacity level parameter of the tested sample to be tested is obtained by using an estimation method by combining the measured data (the response information of the tested sample: the scoring matrix) and the prior information (the capacity level of the tested population is assumed to be subjected to normal distribution).

2. The model-data fitting test deletes non-fitting items and again performs item parameter and capacity parameter estimation on the remaining items.

The improvement points are as follows:

1. compared with the traditional CTT measurement, the IRT establishes the relation between the project parameters and the tested capability parameters, so that the project parameters and the tested capability parameters are calibrated in the same scale, the probability of the tested answer with the capability level of 0.6 on the project with the difficulty estimated value of 0.5 is definitely known to be larger than the probability of the answer error, and the optimal project difficulty is also about 0.6. Whereas in conventional CTT measurements this concept is ambiguous.

2. After selecting the three-parameter model adapting to the field, the invention takes the assumed normal distribution as the input of prior information when estimating the parameters, but selects the prior value given in the hierarchical self-adaptive design optimization model, and the prior value is closer to the true capacity distribution value to be tested, thereby leading the test process to be more efficient and the accuracy to be higher

3. Summary of innovation points:

in summary, the innovation points of the system are as follows:

1. The project response theory of hierarchical self-adaptive optimization complements the blank of the traditional CAT computer self-adaptive evaluation system for describing each test project parameter, so that the self-adaptive system has extremely high robustness and accuracy in a project parameter space, a capacity parameter space and a tested space.

2. The hierarchical self-adaptive optimized project response theoretical model overcomes the defect that the traditional computer self-adaptive evaluation system taking the project response theory as the kernel lacks prior distribution of tested capability, provides prior distribution of tested initial capability which is closer to a true value, effectively reduces the risk that the project response theoretical model converges to a local optimal value, and ensures more accurate test results.

3. The hierarchical iterative form of the hierarchical self-adaptive optimized project response theoretical model complements the defect that a traditional computer self-adaptive evaluation system taking the project response theory as a kernel needs to collect thousands of project parameters in a test sample training question bank, and saves a large amount of early-stage working cost.

4. The project response theory of the hierarchical self-adaptive optimization simultaneously re-estimates project parameters and tested capacity parameters in the self-adaptive test, and all parameters in the system are automatically and continuously optimized along with the increase of the test sample size, so that the accuracy and the robustness of the test system are improved.

5. The test system composed of the hierarchical self-adaptive optimized project reaction theory calculation model also relies on a reliable parameter space available in the theory of external cognitive neuroscience, after correlation modeling is carried out on the project parameter and the capability parameter in the IRT model, when the project is added into the question bank in the future or the project with poor discrimination is screened and excluded, the IRT does not need to be modeled again, and the parameter estimation of all test questions can be completed in the future practical test.

6. The theory of connecting psychology and cognitive science provides authoritative and reliable content and theoretical basis for the test system. A set of objective, systematic and comprehensive measuring tools are established for the industry to reflect the true level of human cognitive functions, so that future research, intervention and treatment are facilitated.

7. The superior evaluation frame can be realized by fusing logic of game scripts, so that the evaluation in a game form can focus attention of a tested person to make the tested person answer seriously, and the effect of the game can be prolonged on special groups facing children, patients with dyskinesia, the elderly and the like. The beneficial effects of the invention include:

/>

Drawings

FIG. 1is a project signature graph of a three parameter model.

FIG. 2 is a system layout of the evaluation system of the present invention.

FIG. 3 is a schematic diagram of the modeling of the evaluation system of the present invention.

Fig. 4 is a schematic view of an evaluation frame of the present invention.

FIG. 5 is a schematic illustration of parametric modeling of the present invention.

Fig. 6 is a flowchart of the evaluation method of the present invention.

Detailed Description

The invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.

The system layout of the self-adaptive assessment system for cognitive ability of the present invention is shown in fig. 2. The organization layout is as follows:

The left side of the evaluation host is an input unit which is used for receiving the personal information and answer data which are input by a test. Including the age of the test, personal information, answer data entered via a mouse, keyboard or microphone, etc.

The evaluation system power supply is a conventional system configuration, and is not described in detail, and only the system specificity is explained in detail below: 201 evaluation software downloader

And the evaluation software encryption downloader establishes a direct connection downloading channel with the back-end evaluation software library, helps a user to download the latest version of software in time, and keeps the version consistency of the test software.

202UART

A universal serial data bus for asynchronous communications. The bus communicates bi-directionally, enabling full duplex transmission and reception. In an embedded design, UARTs are used for the host to communicate with the auxiliary device. Such as various hardware devices of the input unit and the output unit. 203-204USB data collector

The USB interface establishes connection and communication between the computer and the external equipment, has plug and play and hot plug functions, and can be connected with 127 kinds of peripheral equipment, such as a mouse, a keyboard and the like.

205 Data backup storage

The storage unit stores the question library of all development stages and evaluation schemes so that the system can select questions in a self-adaptive mode according to the answer information to be tested.

502 WiFi/Bluetooth

An interface for communicating with other terminals such as personal computers and handheld devices in a wireless manner.

301 Evaluation processor

The operation unit stores a main statistical calculation model of the system and executes a main algorithm of the computer self-adaptive test.

The method comprises the following steps:

The project reaction theory feature function (described in detail below).

The prior parameters of the test are set according to the tested personal information.

The first test question (stimulus) is given on the basis of a priori parameters set at the beginning of the test.

Modifying parameter values based on received input

Determining the next test question according to the modified parameter value

302 System starter

And a system power supply.

303 Operation memory

And calculating various parameters (project characteristic parameters, tested capability parameters and global average information quantity) under the current evaluation progress.

304 Evaluation data memory

And storing various parameters and response data under the current evaluation progress.

501 External evaluation image display and touch screen

The output unit comprises a liquid crystal display screen for presenting a picture of the game test.

Examples

This embodiment illustrates the present invention by taking the evaluation of reading ability as an example.

Evaluation task

/>

Evaluation frame

The evaluation frame is described in detail as follows:

As shown in FIG. 4, the measurement flow of the system is generally divided into a double-layer space with cyclic nesting, the gray part is an adaptive design optimization part, the adaptive design optimization part is built in a parameter space, the cyclic flow is a cyclic flow which is tested in a single test, and t is the test question number. The outer layer of the gray part is a part of a Bayesian hierarchical model, is established in the tested individual space, is a circulation flow of all historical tested answer data, and n is the tested number.

In the adaptive design optimization part, five processes are provided, and the functions of the five processes are as follows:

Assuming that the test has been completed for n-1 subjects currently, and that the (t-1) th test has been performed for the current n-th subject, it is currently necessary to perform the t-th test for it:

Flow 1. Prior distribution of test Capacity

The a priori distribution of the tested ability is assigned by the posterior distribution obtained in the flow 5 after (t-1) test questions, which is expressed as:

p(θ_n|y_n ^(1:t-1),y_1:n)

If the test is tested in the initial test, the initial prior distribution is given an initial value by a Bayesian hierarchical model.

Flow 2 adaptive optimal design

The expression for the desired utility function is:

d_t ^*＝argmaxU(d_t)

Where y _1:n-1 represents all observations of the past n-1 trials, y _n ^(1:t) contains the current n-1 past t-1 test evaluations y _n ^(1:t-1) and the current candidate observations y _n ^(t), it is noted that the utility equation herein adds previously observed data derived in the Bayesian hierarchical model as compared to the utility equation in the technical background section, calculates joint probabilities with the data of the current measurement session, combines adaptive design optimization and the derivation of the Bayesian hierarchical model, and combines the two models herein.

Flow 3 project reaction model solving

For the project reaction model, using expected posterior estimation, constructing posterior distribution of unknown parameters, and under the condition that test sample data and tested characteristic prior distribution are determined, assuming that the prior distribution density function of the project parameters is g (xi), the posterior distribution of the project parameters of the unknown project reaction model is as follows:

Wherein the method comprises the steps of

Wherein P _t (ζ) is a three-parameter project reaction model

Where L (ζ) is a likelihood function of the project parameter.

Wherein the method comprises the steps ofThe marginal probability of a certain reaction pattern u _n is represented.

Finally, the project parameter E (ζ _n|u_n,θ_n) is obtained and updated to the latest project parameter.

In this process, every time a test question is currently done, parameters of all items in the item reaction model are updated.

Flow 4. Obtaining observations y _n ^(t)＝P(u_n)

Flow 5. Calculating posterior distribution of the tested Properties

The posterior distribution of the tested trait will be calculated here using the framework of a bayesian hierarchical model, expressed as:

Wherein the method comprises the steps of

Wherein y _n ^(1:t)＝y_n ^(t-1)+y_n ^(t), wherein y _n ^(t) gives y _n ^(t)＝P(u_n in scheme 4).

Flow 6. Judging whether the test termination condition is satisfied

And judging whether the current total information quantity meets the termination condition, and ending the test if the current information quantity is larger than the threshold condition to obtain an observation result.

If the current information quantity does not meet the termination condition, the posterior distribution of the tested characteristics obtained in the process 5 is used as the prior distribution tested in the process 1 in the next question test, and the circulation is continued.

Flow 7. Bayesian hierarchical model update

When one test is finished, the Bayesian hierarchical model in the tested individual space is updated and is next tested

The beginning of the test was given the appropriate initial value, the updated expression of which is as follows:

p(θ_n+1|y_1:n)＝p(θ_n|y_1:n-1)

Wherein the method comprises the steps of

The additional flow is as follows: parameterized modeling

As shown in FIG. 5, to eliminate the IRT re-modeling process in future question bank extensions, the present invention will parametrically model each item in the test system. Firstly, each item and various parameters corresponding to the item response model are recorded in a question bank of the test system, and are subjected to correlation modeling with the items in the large-scale corpus and various linguistic and semantic parameters.

The invention can obtain the deduction formula of the language and semantic features of the project and each parameter in the project reaction model:

y(β_t,α_t,c_t)＝f(t,l_t,s_t...)

The strain quantity (beta _t,α_t,c_t) in the formula is a parameter of the item t in IRT, l _t,s_t is a parameter of the item in the corpus, and f () is a related method. For example, in a vocabulary test system, each item is a Chinese word, the invention models the correlation between the language semantic parameters such as word frequency, emotion valence, clustering coefficient and the like of each item in a corpus and the item difficulty parameters, distinguishing degree parameters, guessing coefficients and the like which are estimated by a great deal of past tests in an item reaction model.

When the system needs to newly add items, the related model can be used for estimating the parameters of the items unknown in the item reaction model through the known related parameters of the new items in the corpus without estimating the values of the new items by using a large number of test results, so that the items can be quickly put into future tests. By the modeling method, the expansion performance of the test system can be effectively improved.

Evaluation form

The system combines the evaluation content in the evaluation task with the test frame in the evaluation frame and takes the game as a main output form. The test form is more visually attractive and is suitable for children with easily-dispersed attention so as to improve the utility of the evaluation system.

Taking the vocabulary capability evaluation as an example, the game development example is as follows:

The child drives an airplane, a sub-ability stage of representing the vocabulary quantity in one round continuously writes test words on an oncoming gift bag, the child needs to judge whether the test words are true words/false words in time, the airplane is contacted with the gift bag according to the true words, the airplane is prevented from passing through the gift bag according to the false words, gold coins in the true word gift bag are obtained if the true words are guessed, the false words can be bombed by bombs in the false word gift bag if the false words are guessed, when the basic blood quantity is all used up by bomb attack, the round of break-over fails, and the vocabulary quantity grade level is the previous level.

The self-adaptive evaluation system and method for cognitive ability in the embodiment are constructed based on a cognitive development theory and according to a learning theory and a general learning theory in a reading specific field, and respectively cover the evaluation of the tasks related to the reading cognitive ability and the general nonspecific performance. The invention is the evaluating tool with the widest functional coverage and the most systematic completeness at present.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included in the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.

Claims

1. An adaptive assessment system for cognitive ability, the system comprising:

The input unit is used for receiving personal information and answer data which are input by a tested person;

a universal serial data bus for asynchronous communication;

a communication unit for wirelessly communicating with a personal computer, a handheld device terminal;

An arithmetic unit in which a statistical calculation model of the system is stored, and an algorithm of the computer self-adaptive test is executed;

2. The adaptive evaluation system of claim 1, wherein the personal information and answer data comprises: the age of the test, personal information, and answer data inputted through a mouse, a keyboard, or a microphone.

3. The adaptive evaluation system according to claim 1, wherein the arithmetic unit stores therein: setting prior parameters of the item reaction theoretical characteristic function in the test according to the tested personal information; the arithmetic unit gives a first test question based on the prior parameter set at the beginning of the test, modifies the parameter value according to the received input, and determines the next test question according to the modified parameter value.

4. The adaptive evaluation system of claim 1, further comprising an evaluation software downloader for updating software to maintain version consistency of test software.

5. The adaptive assessment system according to claim 1, wherein said system employs an adaptive assessment method of cognitive ability, said method comprising the steps of:

Step 1, if the tested capacity is in the initial test, giving an initial value to the initial priori distribution by a Bayesian hierarchical model; if the test is completed on n-1 tested capacities, and when the nth tested capacity is subjected to the (t-1) test and the nth tested capacity is required to be subjected to the (t-1) test, the tested capacity prior distribution is assigned by posterior distribution obtained after the (t-1) test;

Step 3, calculating expected posterior estimated values by using a three-parameter project reaction model, and constructing posterior distribution of unknown parameters of the expected posterior estimated values;

Step 4, obtaining an observation result;

Step 5, calculating posterior distribution of the tested capacity;

in order to avoid the process of re-modeling the project response model in the expansion of the question bank, further carrying out parametric modeling, recording each project and the corresponding parameter in the project response model in the question bank, and carrying out correlation modeling on each project and the relevant parameter of the cognitive ability in the corpus; the method combines the evaluation content and outputs the result in the form of a game.

6. The adaptive evaluation system of claim 5, further comprising: and 7, when one tested object ends the test, updating the Bayesian hierarchical model in the tested object space, and giving a proper initial value at the beginning of the next tested object, wherein the updating expression is as follows:

p(θ_n+1|y_1:n)＝p(θ_n|y_1:n-1)；

Wherein,

7. The adaptive evaluation system of claim 5, wherein in step 2, the expression of the desired utility function is:

d_t ^*＝argmaxU(d_t)；

8. The adaptive evaluation system according to claim 5, wherein in the step 3, if the test sample data and the prior distribution of the tested feature are determined, and the prior distribution density function of the item parameter is g (ζ), the posterior distribution of the item parameter with respect to the unknown item reaction model is:

Wherein,

Wherein, P _t (ζ) is three-parameter project reaction model is

Wherein L (ζ) is a likelihood function of the project parameter;

Wherein, A marginal probability representing a certain reaction pattern u _n;

Finally obtaining a project parameter E (ζ _n|u_n,θ_n) and updating the project parameter E into the latest IRT model parameter;

and/or the number of the groups of groups,

In the step 5, a framework of a bayesian hierarchical model is used to calculate posterior distribution of the tested feature, and the expression is as follows:

Wherein,

9. The adaptive evaluation system of claim 5, wherein the solution method of the project response model parameters comprises: joint likelihood estimation, bayesian expectation posterior, and markov chain methods.

10. The adaptive evaluation system of claim 5, wherein if the system is developed for reading capability, modeling the relevance of the project parameters in the system to external linguistic parameters; and/or, if the system is developed aiming at reading capability, the deduction formulas of each parameter in the project response model and the language, semantic parameters and the system outside are as follows:

y(β_t,α_t,c_t)＝f(t,l_t,s_t...)；