WO2019243876A1

WO2019243876A1 - Method, system and computer program for determining weights of representativeness in individual-level data

Info

Publication number: WO2019243876A1
Application number: PCT/IB2018/054587
Authority: WO
Inventors: Mathieu TREPANIER; Mikael BOURQUI
Original assignee: Tsquared Insights Sa
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2019-12-26

Abstract

Computerized method for determining weights of representativeness in individual-level data, wherein the individual-level data comprise for a panel of persons search queries performed by each person of the panel, wherein the panel of persons of the individual-level data is a subset of a predetermined population of persons, wherein reference data comprises the occurrences of the search queries performed by the persons of the population, the method comprising the step of :determining, in a processing means, for each person of the panel the weight of representativeness of this person based on the occurrences of the search queries for this person of a defined set of search queries of the individual-level data and based on the occurrence of the search queries of the same set of search queries in the reference data.

Description

Method, system and computer program for determining weights of representativeness in individual-level data

Field of the invention

[0001] The present invention concerns a method, system and computer program for determining weights of representativeness in individual-level data and for performing behavioral analysis in individual-level data. Description of related art

[0002] Surveys try to estimate the opinion, needs, wants, and habits of a population on a certain question. This question could be the outcome of an election, a market study, etc. The quality of the result of the survey depends largely on the representativeness of the panel (also called sample) which is a subset of the population. The same problem arises if individual- level data of a panel are analysed for a question which should be answered at the population level. The ability to interpret the results of any sample- based research as relevant at the population level relies on the degree to which the sample is representative of the population. Knowledge about an unrepresentative sample cannot be generalized to the population and is of limited value to anyone who needs to base decisions on data.

[0003] Reweighting for representativeness is common in survey research. The standard approach involves assigning to each panel participant a weight to compensate the fact that the participant's observable

characteristics (e.g. socio-demographic characteristics) are over or under represented in the panel as compared to the population. In survey research, the need for reweighting arises when the characteristics of the sample of respondents diverge from those of the population, because of factors like lower or higher response rates among particular groups of people. The weights applied to the respondents to a survey are normally determined using their demographic attributes such as age, gender, and geographical location. This assumes that if the data is adjusted (through reweighting) in such a way that each demographic group is represented in proportion to its size in the general population, measurements on the behaviour of the sample will accurately represent the behaviour of the population.

[0004] This approach has the obvious limitation that differences in behaviour do not usually coincide exactly with demographic categories. Indeed, it is impossible to know whether a sample accurately represents the behaviour of the population by looking at demographic attributes alone. A sample might be 'representative' in terms of its age, gender, geography, or even income distribution while still over- or under-sampling groups with certain hard-to-measure attributes such as personality traits (extraversion, curiosity, conscientiousness, ...) or personal history (experience with computers, travel abroad, exposure to media, ...). Many survey designs attempt to take such factors into account, but can only do so to the extent that the factors to correct for have been anticipated by the researcher and can be measured. Therefore, this approach cannot yet be fully automatized and still needs to be designed by human interactions to define or correct the weights of the panel participants. This is also due to the fact that the demographic attributes and the non-demographic attributes are normally not available for all persons. It is thus problematic to use this method for the automatized analysis of individual-level data as for example in the behavioural analysis of individual-level data.

Brief summary of the invention

[0005] It is an object to find an improved method for determining weights for representativeness for participants of individual-level data. In particular, the method should be fully automatized and/or improve the accuracy of the estimates achieved by weighting for representativeness with the improved weights of representativeness.

[0006] It is a further object to fully automatize and/or improve the behavioural analysis of individual-level data.

[0007] This object is solved by the independent claims. [0008] The weights of representativeness are determined based on the occurrences of the search queries of the persons of the panel and the occurrences of the search queries of a population which should be represented by the panel. This allows first a fully automatic determination of the weights of representativeness. Second, the representativeness of the results of the panel for the population is significantly improved, because the representativeness is determined on data representing the behaviour. This allows to determine the representativeness of each individual significantly better than with demographic attributes. Third, this allows to analyse also data for which demographic attributes of the persons of the panel are missing or are incomplete. Fourth, this method is particularly well suited for representing a population which is limited to the internet using part of a general population.

[0009] The dependent claims refer to advantageous embodiments. [0010] The way of calculating the weights of representativeness defined in the dependent claims are particularly well suited to improve the representativeness of the panel.

[0011] An alternative embodiment of the invention is a computerized method for determining weights of representativeness in individual-level data, wherein the individual-level data comprise for a panel of persons e- commerce shopping events performed by each person of the panel, wherein the panel of persons of the individual-level data is a subset of a predetermined population of persons, wherein reference data comprises the occurrences of the e-commerce shopping events performed by the persons of the population, the method comprising the step of determining, in a processing means, for each person of the panel the weight of representativeness of this person based on the occurrences of the e- commerce shopping events for this person of a defined set of e-commerce shopping events of the individual-level data and based on the occurrence of the e-commerce shopping events of the same set of e-commerce shopping events in the reference data. [0012] In one embodiment, the weight of representativeness of each person is further based on the occurrences of the e-commerce shopping events for all persons of the panel of the defined set of e-commerce shopping events of the individual-level data. Preferably, the weights of representativeness are determined iteratively. Preferably, the weight of representativeness of each person in a second or higher iteration is further based on the weight of representativeness of this person from the previous iteration. Preferably, the weight of representativeness of each person in a second or higher iteration is further based on the weight of

representativeness of all persons of the panel from the previous iteration. Preferably, the following steps are performed in the processing means in each iteration: selecting the defined set of e-commerce shopping events in the individual-level data for this iteration; determining for each person of the panel an intermediate weight of representativeness of this person based on the occurrences of the e-commerce shopping events for this person, preferably for all persons of the panel of the set of e-commerce shopping events selected in this iteration and based on the occurrences of the e-commerce shopping events of the same set of e-commerce shopping events in the reference data, and determining for each person of the panel the weight of representativeness of this person based on the intermediate weight of representativeness of this person. Preferably, for the second or higher iteration, the weight of representativeness of each person of the panel is based on the intermediate weight of representativeness of this person of this iteration and the weight of representativeness of this person from the previous iteration.

[0013] In one embodiment, the weight of representativeness of each person or the intermediate weight of representativeness of each person is determined, in the processing means, based on a weighted sum over all e- commerce shopping events of the defined set of e-commerce shopping events of the multiplication of the occurrence of each e-commerce shopping event of the defined set of e-commerce shopping event for this person with a weight of each e-commerce shopping event of the defined set of e-commerce shopping event. Preferably, the weight of

representativeness of each person or the intermediate weight of representativeness of each person is further determined, in the processing means, based on a sum over all e-commerce shopping events of the defined set of e-commerce shopping events of the weight of each e-commerce shopping event of the defined set of e-commerce shopping event.

Preferably, the weight of representativeness of each person or the intermediate weight of representativeness of each person is determined, in the processing means, based on ratio of a sum of an offset and of the sum over all e-commerce shopping events of the defined set of e-commerce shopping events of the weight of each e-commerce shopping event of the defined set of e-commerce shopping events divided by a sum of the offset and of the weighted sum. Preferably, the weight of each e-commerce shopping event is determined based on the ratio of the cumulative occurrence of this e-commerce shopping event of all persons of the panel divided by the occurrence of this e-commerce shopping event in the population.

[0014] In one embodiment, the cumulative occurrence of this e- commerce shopping event of all persons of the panel is based on the sum over all persons of the panel of the multiplication of the occurrence of this e-commerce shopping event of each person of the panel with the weight of representativeness of the respective person from the previous iteration.

Brief Description of the Drawings

[0015] The invention will be better understood with the aid of the description of an embodiment given by way of example and illustrated by the figures, in which: Fig. 1 shows a view of a schematic embodiment of a system according to the invention.

Fig. 2 shows a view of a schematic embodiment of a method for behavioural analysis in individual-level data according to the invention. Fig. 3 shows a view of a schematic embodiment of a method for determining weights of representativeness in individual-level data according to the invention.

Detailed Description of possible embodiments of the Invention [0016] The term "person" is used herein to define any distinguishable individual. The distinguishable individual is normally automatically detected, e.g. by device tracking, etc. such that the person can also refer to an identified device. This automatic detection of the persons might not always be detected correctly such that two persons might in fact refer to two humans. The term person shall not be interpreted as the human behind the person, but as the distinguishable individual defined in the individual-level data (see definition below) deemed to be a different human than the other persons. Preferably, the person is distinguishable in the individual-level data by an identifier. This identifier could be simply a code, a number, name, etc. Preferably, the identifier is anonymous such that the person cannot be identified.

[0017] The term "population" defines a set of persons with at least one common demographic attribute. This at least one common demographic attribute is preferably the geographical region, e.g. the country. The population could be all persons of a country region comprising at least one country, e.g. Switzerland, European Union, etc. It is also possible that the at least one common demographic attribute comprises (in addition) the age. The population could thus comprise all persons of a certain age group (in a certain geographical region). Due to the later defined method, the population is restricted to the internet using persons of the at least one common demographic attribute. For example, the population comprises all internet-using persons of a certain country region.

[0018] The term "search query" refers to any identifier defining a search in the internet by a person in a search engine. The search query is preferably a keyword. A keyword can comprise one or more words. A word can be any concatenation of characters like letters, numbers or other signs defined by character encodings like American Standard Code for

Information Interchange (ASCII) or Unicode transformation format (UTF). However, the search query could refer also to complex search query identifiers like images, sounds, locations, etc.. [0019] The term "individual-level data" refers to any data which are associated to different persons, in particular to the identifiers of the persons. The individual-level data comprise thus for each person (or its identifier) data associated to the person. This data for some of those persons can be empty, if this person was maybe not active for example in a relevant period. The individual-level data comprise at least the search queries performed by the persons. Thus, the individual-level data comprise for each person the search queries performed. Preferably, the individual- level data comprise for each search query the keyword searched (the term entered into the search engine by the user) and the identifier identifying (maybe anonymously) the person performing the search. The search queries performed by each person could be stored in different ways. The

individual-level data could comprise directly the occurrence of each search query performed by each user. Alternatively or in addition, the individual- level data could comprise the search queries performed by each user over the time (with or without a time stamp). This allows to determine the occurrence from the individual-level data. In other words, the occurrence of the search queries performed by each person might be stored directly or indirectly in the individual-level data. In the latter case, the occurrence of the search queries can be retrieved from the individual-level data. The individual-level data comprise preferably further data associated to each person, e.g. location data, visited web pages, etc. The individual-level data can be already pre-processed for the below described methods. The pre processing could comprise anonymization and or a categorization. The categorization could for example group the activity of each person (e.g. search queries) in different time slots (hours, days, weeks, months, years) or in only one time slot to be analysed. The individual-level data can however also be rather in a raw format such that the below method could comprise the steps of anonymization or categorization or of retrieving the

information necessary for the below mentioned methods. A person of the individual-level data indicates a person in relation to whom data, in particular performed search queries are stored in the individual-level data. The individual-level data can also be device-level data. In this case, each person corresponds to a device and/or each device corresponds to a person. The individual-level data preferably comprise records of actions of the persons. The actions can be search queries or other actions like visiting web-sites, opening applications, visiting locations, etc. Many records of actions about one person constitute the data about the behaviour of the person. [0020] The term "reference data" shall comprises any data which indicate the occurrence of different search queries in the population. This can be directly the list of search queries with their respective occurrence. However, it is also possible that the reference data contains only indirectly the information of the occurrence of different search queries in the population and that this information must be retrieved from the reference data.

[0021] The term "panel" refers to a defined set of persons of the individual-level data which is a (strict) subset of the population. The panel comprise preferably all persons which fulfill the at least one common demographic attribute of the population. It is however possible to define the set of persons of the panel smaller, e.g. because some persons where not active during the complete time window of analysis or for other reasons.

[0022] The term "occurrence of a search query" can be any information which indicates the occurrence of a search query. It could be the absolute number of times the search query was performed (by a person, a panel or a population). In this case, it is important that the absolute number is always taken for the same period of time. Preferably but not necessarily, the same period of time as used for the behavioural analysis described later. It is however also possible that the occurrence of a search query is indicated as a frequency, i.e. normalized by a time period. This has the advantage that data with different time periods can be used. This has however also the disadvantage that different time periods might deviate the results.

[0023] The terms like time window, period of time, time slot, etc. are used interchangeably. [0024] Fig. 1 shows an embodiment of a system for performing the below described methods. The system comprises a storage means 1 and a processing means 2. The system can be a computer. The system can also comprise two or more interconnected computers. The interconnected computers could be located in the same location as in a data center or in remote locations as for cloud computing. The system can also be realised in a specialised processing chip. Many realizations of the system are possible.

[0025] The storage means 1 stores the individual-level data and the reference data. The storage means 1 can comprise a first storage section for storing the individual-level data. The storage means 1 can comprise a second storage section for storing the reference data. The storage means 1 can comprise a third storage section for storing a computer program with instructions which perform the below described methods, when executed on the processing means 2. The first, second and/or third storage section can be arranged in separate storage devices forming together the storage means or (as logical sections) in the same storage device. The storage means 1 can comprise one, two or more storages devices. The storage devices can be located in the same location or in remote locations. The storage means 1 can be (completely or in part) in the same location as the processing means 2 or in a remote location to the processing means 2.

[0026] The processing means 2 is configured to execute the method described below. Any kind of processing means 2 which allows to execute the below described method can be used. The processing means 2 can be for example at least one processor. The processing means 2 can comprise one or more processors. [0027] Preferably, the system comprises further an interface for outputting the result of the below described method. The interface could be a display, a socket for a display, a communication interface like a network or peripheral interface/socket. [0028] Fig. 2 shows an embodiment of a method for an analysis of the individual-level data. The method is performed on the persons of a panel defined. The panel can be the same for different analysis' or can be defined each time in dependence of the analysis. The method can be performed on a defined time window in the individual-level data. However, it is also possible that the method is applied without considering a time window or using the complete individual-level data (in respect to the time).

[0029] In a first step S1, the weights of representativeness are

determined for the panel according to the invention based on the individual-level data, in particular based on the search queries of the panel in the individual-level data. The details of the step are described below with the help of Fig. 3.

[0030] In a second step S2, the behavior of each person of the panel is analyzed on the basis of the individual-level data. This analysis results in an analysis result for each person. The analysis is preferably a behavioral analysis. Preferably, the records of each person of the panel are analyzed for certain criteria to obtain an analysis result for each person. This criteria could be for example, "has the individual visited football related content on media websites?" to answer the question of whether a person is a football fan. The analysis result would be a binary variable meaning for each person either "yes, this person is a football fan" or "no, this person is no football fan".

[0031] In a third step S3, the analysis result of the panel is reweighted for representativeness. This is done by weighting the analysis result of each person with weight of representativeness of this person and combining the weighted results of each person to obtain an analysis result of the population, preferably a behavioural result of the population. [0032] If the same panel and the same population is used for different analysis, step S1 must be performed only once. The same weights of representativeness can be used in step S3 for different analyses so that the step S1 does not need to be repeated for each analysis. Just when a new panel or a new time period is selected, the new weights of

representativeness need to be determined again in step S1.

[0033] Fig. 3 shows an embodiment for determining the weights of representativeness based on the individual-level data.

[0034] In a step S11, the method is initialized. This could comprise: 1) the selection of some parameters such as the at least one common

demographic attribute for defining the population, 2) the identification of the persons in the individual-level data corresponding to the population or with at least one common demographic data for the panel, 3) the number of persons I included in the panel (preferably all persons of the individual- level data belonging to the population) and/or 4) the number of search queries J to be considered in each iteration. The number of search queries J is adjusted based on tests of algorithm performance and is preferably larger than 50, preferably than 100, preferably than 200, preferably than 500, preferably than 1000. The persons of the panel are labelled £ ⁼ i

This labelling is just an arbitrary identifier used for distinguishing the persons. This labelling shall not be limitative for the invention. The initialization step S11 can also be omitted, if those parameters do not change.

[0035] In a preferred embodiment, the method is iterative such that the steps S12 to S18 are performed at least twice and/or are performed until a certain stop criterion is fulfilled. In this case, at least one of the method steps of the r-th iteration is based on the weight of representativeness coi^{( )} of the previous iteration r-1. In this case, the weight of representativeness co i⁽⁰⁾ of the zeroth iteration used for the first iteration r=1 is set for all persons ^{!" =} T of the panel to the same initial value which is preferably one: ^{— i}, for all persons ^{5 -} ^ However, it is also possible to determine the weight of representativeness in one run, i.e. not iteratively, such the step S 17 is not necessary and the steps S12 to S15 are performed just once.

[0036] In a step S12, a set of search queries is defined. Preferably, J search queries are selected. Preferably, the number J is equal for each iteration, preferably as selected in the initialization step. However, it is also possible to select the number J different in each iteration r. The J search queries of the individual-level data are preferably selected randomly.

However, it is also possible to select the J search queries by other criteria, e.g. their occurrence (e.g. the J most used search queries in the individual- level data or in the population) or their order (e.g. the J first search queries). However, the best results are achieved, if the J (distinct) search queries are selected randomly. Preferably, the J search queries are distinct to each other. A set of / distinct search queries

··· . ^h ·: is selected (in the r-th iteration) from the individual-level data as described above. [0037] In step S13, the occurrences of the J search queries are retrieved.

, ⁱ· ^ϊ'/

The occurrence i of the search query ^K} that each user has searched each of the search query

is retrieved from the individual-level data. The occurrence

t is preferably the number of times

that each user has searched each of the search queries ¾ . To retrieve the occurrence

j from the individual-level data means that the occurrence

Ί can be already stored in the individual-level data and just be read or the

individual-level data can be processed to determine the occurrence

j )

Further, the occurrence of each search query j in the population is retrieved from the reference data. [0038] In step S14, a relative weight of each search query in the panel is determined. The relative weight R ^r) for the search query j is determined based on the combination of the occurrences of the search query j for all users ^{i :=}

and on the occurrence ¾ of the search query j in the population (retrieved from the reference data). Preferably, the relative weight R/^r) for the search query j is determined based on the ratio ^hi ^¾i ^ ¾ of the combination of the occurrences x/^r) of the search

ί.?\ί

query j for all users

and on the occurrence ^:V; of the search query j in the population. In other words, the relative weight R ^r) for the search query j is determined based on the ratio of the sample volume of the search query j and the population volume of the search query j. The sample volume and/or the combination of the occurrences x ^r) of the search query j is determined based on or is equal to the sum

^ over the persons i of the panel of the occurrences of the search query j of the person i. Preferably, the combination of the

occurrences x ^r) of the search query j for all users ^:= ^ - and/or the sum ^pΊ ^~ over the persons i of the panel of the occurrences of the search query j of the person i is based on a corrected occurrences of the search query j of the i-th person

which is obtained by a combination, preferably the multiplication of the occurrence ^xf of the j-th search query of the i-th user and the i-th user's weight of

representativeness coi^{( )} of the last iteration r-1. However, it is also possible to determine the relative weight on the basis of the non-corrected occurrences.

[0039] In step S15, an intermediate weight of representativeness is determined for each person of the panel. Preferably, the intermediate weight of representativeness for the person i is determined based on the weighted sum over all J search queries j of the multiplication of the occurrence ^xi of the search query j for this person i with the relative weight R/^r) of the defined set of search query j.

based on a sum over the J search queries of the weight R/^r) of each search query j

Preferably, the intermediate weight of representativeness for the person i is determined based on a ratio of a sum of an offset and of the sum over all search queries of the defined set of search queries of the weight of each search query of the defined set of search queries divided by a sum of the offset and of the weighted sum

The offset is preferably selected as 1, but can be another number. The offset is preferably non-zero. The non-zero offset is added to the

numerator and denominator of the weighted average computation so as to ensure that the adjustment value is nonzero and definite. Preferably, the intermediate weight of representativeness for the person i is determined based on the inverse of this ratio

Recall that for each individual, we have occurrence

recording how many times that individual has searched each of the current round's / randomly-selected keywords. The adjustment to that individual's weight for the current round is the inverse of the average of these counts weighted by the relative weights (eventually corrected by the mentioned offset in the numerator and the denominator).

[0040] If the weight of representativeness is calculated non-iteratively, the intermediate weight of representativeness corresponds to the final weight of representativeness. Otherwise, in step S16 the weight of representativeness of each person i is calculated based on the intermediate weight of representativeness of the respective person i combined, preferably multiplied with the weight of representativeness of this person i of the previous iteration r-1

[0041] If the weight of representativeness is calculated iteratively, it is checked in step S17, if a stopping condition is fulfilled. If the stopping condition is fulfilled, the method ends in step S18, otherwise the steps S12 to S17 are repeated as described above. Many stopping conditions are possible. A preferred stopping condition is that the total difference between the weights of representativeness at the end of the iteration r and their corresponding values at the end of the previous iteration r-1 fall below a set threshold for n successive iterations n is preferably at least two.

(threshold value }

However, it is also possible that n =1. This stopping condition can also be combined with a maximum number of iterations:

= ¾AX. However, other stopping conditions are also possible.

[0042] In an alternative embodiment, it is also possible to determine the weights of representativeness based on the occurrences of e-commerce shopping events for this person of a defined set of e-commerce shopping events of the individual-level data and based on the occurrence of the e- commerce shopping events of the same set of e-commerce shopping events in the reference data. The above-described applies analogously for this alternative embodiment, wherein the search queries above are replaced by e-commerce shopping events.

[0043] The term e-commerce shopping event shall be an event indicating an e-commerce shopping activity of a person or of persons.

Preferably, the e-commerce shopping event (of a person) comprises search queries and/or acquisitions (of this person) on one or multiple e-commerce shopping website(s)/platform(s). For example, the search queries and/or acquisitions of a person on e-commerce shopping websites like amazon (registered trademark), ebay (registered trademark) or any other e- commerce shop could be considered to be an e-commerce shopping activity. The e-commerce shopping event could comprise a certain product (identified for example by an electronic product code or any other identifier of the product) or a product category. [0044] In this alternative embodiment, the individual-level data comprise at least the e-commerce shopping events performed by the persons (of the panel). Thus, the individual-level data comprise for each person the e-commerce shopping events performed (maybe instead of the search queries). Preferably, the individual-level data comprise for e- commerce shopping event the e-commerce shopping event and the identifier identifying (maybe anonymously) the person performing the event.

[0045] The term "reference data" shall comprise any data which indicate the occurrence of different e-commerce shopping events in the population (maybe instead of the occurrences of the search queries in the population).

[0046] The term "occurrence of an e-commerce shopping event" can be any information which indicates the occurrence of a defined e-commerce shopping event, e.g. the search and/or acquisition of a certain product or of a product from a product category. It could be the absolute number of times the e-commerce shopping event was performed or a frequency of the e-commerce shopping event.

Claims

1. Computerized method for determining weights of

representativeness in individual-level data, wherein the individual-level data comprise for a panel of persons search queries performed by each person of the panel, wherein the panel of persons of the individual-level data is a subset of a predetermined population of persons, wherein reference data comprises the occurrences of the search queries performed by the persons of the population, the method comprising the step of :

determining, in a processing means (2), for each person of the panel the weight of representativeness of this person based on the occurrences of the search queries for this person of a defined set of search queries of the individual-level data and based on the occurrence of the search queries of the same set of search queries in the reference data.

2. Method according to claim 1, wherein the weight of

representativeness of each person is further based on the occurrences of the search queries for all persons of the panel of the defined set of search queries of the individual-level data.

3. Method according to one of the previous claims, wherein the weights of representativeness are determined iteratively.

4. Method according to the previous claim, wherein the weight of representativeness of each person in a second or higher iteration is further based on the weight of representativeness of this person from the previous iteration.

5. Method according to the previous claim, wherein the weight of representativeness of each person in a second or higher iteration is further based on the weight of representativeness of all persons of the panel from the previous iteration.

6. Method according to one of claims 4 to 5, wherein the following steps are performed in the processing means (2) in each iteration: selecting the defined set of search queries in the individual- level data for this iteration;

determining for each person of the panel an intermediate weight of representativeness of this person based on the occurrences of the search queries for this person, preferably for all persons of the panel of the set of search queries selected in this iteration and based on the occurrences of the search queries of the same set of search queries in the reference data, and

determining for each person of the panel the weight of representativeness of this person based on the intermediate weight of representativeness of this person.

7. Method according to the previous claim, wherein for the second or higher iteration, the weight of representativeness of each person of the panel is based on the intermediate weight of representativeness of this person of this iteration and the weight of representativeness of this person from the previous iteration.

8. Method according to one of the previous claims, wherein the weight of representativeness of each person or the intermediate weight of representativeness of each person is determined, in the processing means (2), based on a weighted sum over all search queries of the defined set of search queries of the multiplication of the occurrence of each search query of the defined set of search query for this person with a weight of each search query of the defined set of search query.

9. Method according to the previous claim, wherein the weight of representativeness of each person or the intermediate weight of representativeness of each person is further determined, in the processing means (2), based on a sum over all search queries of the defined set of search queries of the weight of each search query of the defined set of search query.

10. Method according to claim 8 or 9, wherein the weight of representativeness of each person or the intermediate weight of representativeness of each person is determined, in the processing means (2), based on ratio of a sum of an offset and of the sum over all search queries of the defined set of search queries of the weight of each search query of the defined set of search queries divided by a sum of the offset and of the weighted sum.

11. Method according to one of claims 8 to 10, wherein the weight of each search query is determined based on the ratio of the cumulative occurrence of this search query of all persons of the panel divided by the occurrence of this search query in the population.

12. Method according to claim 11 and one of claims 4 to 7, wherein the cumulative occurrence of this search query of all persons of the panel is based on the sum over all persons of the panel of the

multiplication of the occurrence of this search query of each person of the panel with the weight of representativeness of the respective person from the previous iteration.

13. Computerized method for behavioural analysis in individual- level data comprising the following steps:

determining weights of representativeness of each person of a panel of the individual-level data for a population according to the method according to one of the previous claims;

analyzing the behavior of each person of the panel on the basis of the individual-level data to obtain a behavioral result for each person;

determine a behavioral result of the population based on the combination of the results of the analyzed behavior of each person of the panel weighted by the determined weight of representativeness of the respective person.

14. Computer program comprising a set of instructions configured to perform the steps of the method according to one of the previous claims, when executed on a processing means (2).

15. System comprising:

storage means (1) storing individual-level data and reference data, wherein the individual-level data comprise for a panel of persons search queries performed by each person of the panel, wherein the panel of persons of the individual-level data is a subset of a predetermined population of persons, wherein reference data comprises the occurrences of the search queries performed by the persons of the population; and processing means (2) configured to determine for each person of the panel the weight of representativeness of this person based on the occurrences of the search queries for this person of a defined set of search queries of the individual-level data and based on the occurrence of the search queries of the same set of search queries in the reference data.

16. System according to claim 15, wherein the processing means (2) is further configured to:

analyzing the behavior of each person of the panel on the basis of the individual-level data to obtain a behavioral result for each person; and