CN112685654B - Student identification method and device, computing equipment and readable computer storage medium - Google Patents

Student identification method and device, computing equipment and readable computer storage medium Download PDF

Info

Publication number
CN112685654B
CN112685654B CN201910990107.8A CN201910990107A CN112685654B CN 112685654 B CN112685654 B CN 112685654B CN 201910990107 A CN201910990107 A CN 201910990107A CN 112685654 B CN112685654 B CN 112685654B
Authority
CN
China
Prior art keywords
campus
student
base station
data
mobile phone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910990107.8A
Other languages
Chinese (zh)
Other versions
CN112685654A (en
Inventor
钱慧如
郑欢
许乐静
傅泉辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910990107.8A priority Critical patent/CN112685654B/en
Publication of CN112685654A publication Critical patent/CN112685654A/en
Application granted granted Critical
Publication of CN112685654B publication Critical patent/CN112685654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the invention relates to the technical field of information, and discloses a student identification method and device, a computing device and a readable computer storage medium, wherein the method comprises the following steps: acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value; obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user; identifying a student from the set of campus roaming numbers based on the attribute data. Through the mode, the embodiment of the invention can improve the accuracy of student number identification.

Description

Student identification method and device, computing equipment and readable computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of information, in particular to a student identification method and device, a computing device and a readable computer storage medium.
Background
Campus marketing is a marketing mode for students, the season of open school each year, and various marketing activities occur in campuses, such as: telephone cards, living goods, school supplies and the like, most of which are offline marketing, online (e.g., weChat) marketing is gradually popular with the development of the Internet of things, and online marketing becomes an important means for various merchants.
In real life, advertisements become a means of commodity marketing, consumers can know commodities well through commodity advertisements and attract the consumers to purchase the commodities, a plurality of merchants publicize the commodities in modes of network television advertisements, billboard arrangement and the like, and advertisement pushing to the mobile phones of the consumers is a means which can be selected by the merchants. When a plurality of merchants conduct campus marketing activities, commodity information is pushed to the acquired mobile phone numbers through the pre-acquired student mobile phone numbers. The accuracy of identifying students through mobile phone numbers is important, and the marketing effect is related.
In the prior art, the mobile phone number of the student can be acquired by means of identity information, for example: the method comprises the steps that a new number appearing in a school yard is combined with identity and age information of a phone number registration owner to judge whether a target number is a student in the campus or not, and the mobile phone number of the student is obtained according to the mode, but the method has the defects that the phone number registration owner may not be the owner or the identity data of the owner is lacked, so that the number of the student mobile phone which can be obtained is limited;
another way is to find out whether the number has a clue of the student according to historical data of the campus, for example, whether the student dials a college entrance hot line, etc., but the mobile phone number that dials the college entrance hot line may be a parent, so that the accuracy of the mobile phone number of the student obtained in this way is not high.
Another method is to judge numbers frequently contacted with other students as student numbers by analyzing communication after new numbers arrive at school according to communication data, and the method is based on the premise that identities of other students are accurately identified and accurate new data is needed to effectively start.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a student identification method and apparatus, a computing device, and a computer storage medium, which overcome the foregoing problems.
According to an aspect of an embodiment of the present invention, there is provided a student identification method, including: acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value; obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user; identifying students from the campus roaming number set based on the attribute data.
According to another aspect of an embodiment of the present invention, there is provided a student identification apparatus, including: the device comprises a set acquisition module, a processing module and a processing module, wherein the set acquisition module is used for acquiring a campus roaming number set, the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time when the roaming numbers appear in the corresponding campus does not exceed a preset value; a data obtaining module, configured to obtain attribute data of the roaming numbers, where each attribute data includes one or more of the following: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user are recorded; an identification module to identify a student from the set of campus roaming numbers based on the attribute data.
According to another aspect of embodiments of the present invention, there is provided a computing device including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the steps of the student identification method.
According to a further aspect of an embodiment of the present invention, there is provided a computer-readable storage medium having at least one executable instruction stored therein, the executable instruction causing the processor to perform the steps of the student identification method described above.
According to the embodiment of the invention, the campus roaming number set is obtained firstly, each number in the campus roaming number set is identified based on the activity record of the user of the mobile phone number, the real-time mobile phone data stream, the user behavior data and the like, and the student number list is output, so that the accuracy of student number identification can be improved.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flow chart illustrating a student identification method according to a first embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating a specific procedure of step S13 of the student identification method according to the first embodiment of the present invention;
fig. 3 is a specific flowchart of step S131 of the student identification method according to the first embodiment of the present invention;
fig. 4 is a schematic flowchart illustrating a specific step S1311 of the student identification method according to the first embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating a specific flow of step S42 of the student identification method according to the first embodiment of the present invention;
fig. 6 is a specific flowchart illustrating step S1312 of the student identification method according to the first embodiment of the present invention;
fig. 7 is a schematic flow chart illustrating a specific flow of step S62 of the student identification method according to the first embodiment of the present invention;
fig. 8 is a schematic flow chart illustrating a specific process of step S622 of the student identification method according to the first embodiment of the present invention;
fig. 9 is a schematic specific flowchart of step S1313 of the student identification method according to the first embodiment of the present invention;
fig. 10 is a flowchart illustrating a student identification method according to a second embodiment of the present invention;
fig. 11 is a detailed flowchart illustrating step S103 of the student identification method according to the second embodiment of the present invention;
fig. 12 is a flowchart illustrating a student identification method according to a third embodiment of the present invention;
fig. 13 is a detailed flowchart illustrating step S123 of the student identification method according to the third embodiment of the present invention;
fig. 14 is a flowchart illustrating a student identification method according to a fourth embodiment of the present invention;
fig. 15 is a schematic specific flowchart of step S143 of the student identification method according to the fourth embodiment of the present invention;
fig. 16 is a schematic structural view showing a student identification device provided in a fifth embodiment of the present invention;
fig. 17 shows a schematic structural diagram of a computing device provided by a sixth embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In a first embodiment of the present invention, fig. 1 shows a flowchart of a student identification method provided in an embodiment of the present invention. As shown in fig. 1, the student identification method includes:
step S11: acquiring a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time that continuously accumulates occurrences, such as: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may be associated with multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S12: acquiring attribute data of the roaming numbers;
specifically, a campus roaming number set L is obtained SchRoam Attribute data for each number in (a), which may include one or more of: the user corresponding to each roaming number is in a preset time periodThe activity record of the user, the real-time mobile phone data stream of the user, and the behavior data of the user; that is, the attribute data may be one of the activity record of the user corresponding to each roaming number in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user, or may include three of the activity record of the user corresponding to each roaming number in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user, which is not limited herein. Further, the campus roaming number set L of the last year can be extracted from the big data platform SchRoam Active record set R of all numbers in histRoam The activity record r corresponding to each number in the activity record set i,t,bs The method comprises the following steps: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identifier actType (call, SMS, other), call/short message opposite terminal number conNum, and incoming/outgoing call identifier direction (in/out). Real-time mobile phone data stream r of the user i,t,grid Obtaining, from a base station associated with a campus, comprising: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, other), call/short message opposite end number conNum, incoming call/outgoing call identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number set RTAccu Comprises a real-time mobile phone data stream r corresponding to each number i,t,grid . The behavior data come from a real-time mobile phone data stream R of a campus associated base station RT Real-time mobile phone data stream R RT And campus roaming number set L SchRoam Matching is carried out, numbers which do not belong to the campus roaming number set are removed, and the obtained mobile phone data stream R RTSchoam The mobile phone data stream R is used RTSchoam Classifying and summarizing according to each number i to obtain the user i belonging to L of each number SchRoam The accumulated short-term mobile phone data stream R in the campus i,RTSchoam When obtaining the mobile phone data stream R of each user i i,RTSchoam Then, the behavior data coding is carried out on each mobile phone number, and the mobile phone data of the user i is extractedStream R i,RTSchoam Mobile phone browsing record r i,t,loc,url The method comprises the following steps: i is as large as L SchRoam Recording time t, recording base station bsid, base station longitude and latitude (lon/lat), access page classification (pageTypeId), used APP- (appId), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum, and the like.
Step S13: identifying students from the campus roaming number set based on the attribute data;
specifically, each number in the campus roaming number set is identified based on the attribute data to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which is not limited herein.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In a preferred embodiment of the present invention, the attribute data includes: referring to fig. 2, the step S13 includes activity records of users corresponding to each roaming number in a preset time period, real-time mobile phone data streams, and behavior data of corresponding users:
step S131, identifying numbers of students from the campus roaming number set respectively based on activity records of corresponding users in preset time periods, real-time mobile phone data streams and behavior data of corresponding users to obtain corresponding student lists;
specifically, numbers of students are identified from the campus roaming number set based on activity records of corresponding users in a preset time period, real-time mobile phone data streams and behavior data of corresponding users, so as to obtain a corresponding student list, wherein the learning list may include identification results based on the activity records, identification results based on the mobile phone data streams and identification results based on the behavior data.
Step S132, merging the obtained student lists, and outputting student identification results;
specifically, the obtained recognition result based on the activity record, the recognition result based on the mobile phone data stream, and the recognition result based on the behavior data are input to a merge model for merging, the student recognition result is output, a final student number list is obtained, and the current student number list is updated, and in addition, the student recognition result may further include: a list of parent numbers.
In the embodiment of the invention, number recognition is firstly carried out based on the activity record, the mobile phone data stream and the behavior data respectively, then all number recognition results are input into the merging model for learning training to obtain a final student number list, and the accuracy and the reliability of recognition can be further improved.
In the embodiment of the present invention, referring to fig. 3, the step S131 includes:
step S1311, identifying student numbers from the campus roaming number set based on the activity records in the preset time period, and obtaining identification results based on the activity records;
specifically, numbers of students are identified from a campus roaming number set based on activity records of each number in a preset time period, and a student number list based on the activity records is obtained; and executing the identification operation on each number in the campus roaming number set to obtain a corresponding identification result, wherein all the identification results form an identification list, and the identification list comprises a student number identification list and an accompanying number list of each number and the like. The specific value of the preset time period may be set according to the actual situation, and is not limited herein, for example: one or half a year, or two years, etc. Preferably, the preset period of time is one year.
Step S1312, identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
specifically, the identification result of each number is obtained by identifying the real-time mobile phone data stream acquired from the base station and the campus roaming number set, and the identification results of all numbers form the identification result based on the mobile phone data stream, wherein the identification result comprises a student number list and an accompanying number list of each student number.
Step S1313: identifying numbers of students from the campus roaming number set based on the behavior data to obtain identification results based on the behavior data;
specifically, the behavior data acquired from the base station is identified with the campus roaming number set to obtain the identification result of each number, the identification results of all numbers form the identification result based on the behavior data, and the identification result comprises a student number list and an accompanying number list of each student number.
It should be noted that, the sequence of step S1311, step S1312, and step S1313 is not limited, and step S1311, step S1313, and step S1312 may be performed first; or may be the first step S1312, the step S1311, and the last step S1313; step S1312, step S1313, and step S1311 may be performed first; step S1313, step S1311, and step S1312 may be performed first, step S1313, step S1312, and step S1311 may be performed first, and step S1311, step S1312, and step S1313 may be performed simultaneously, which is not limited herein.
In the embodiment of the present invention, as shown in fig. 4, step S1311 specifically includes:
step S41, acquiring the activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
specifically, the campus roaming number set L of the campus in the last year is firstly extracted from a big data platform SchRoam Active record set R of all numbers in (1) histRoam The activity record r corresponding to each number in the activity record set i,t,bs The method comprises the following steps: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), and collecting the activity records of each number i into an activity record set R histRoam
Step S42, acquiring base station data corresponding to each number based on the activity record set;
in particular, each activity record is analyzed to identify base station data for the corresponding number, e.g., to identify the number corresponding to the subscriber's residence base station BS i,r And a base station BS of a work place i,p (for students, the work place base station is the school base station), namely, the position of the residence place and the position of the work place of the user are identified;
s43, acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of the corresponding base station;
specifically, after obtaining the work site base station and the residence base station of the user corresponding to each number, acquiring a number set appearing in the same base station based on the work site base station and the residence base station, respectively, the number set including a target number set and an accompanying number set, the target number set including each residence base station number set and each work site base station number set, forming a corresponding residence number set for each residence base station based on the residence base station corresponding to each number, obtaining the residence accompanying number set corresponding to each number from the acquired residence number set, then forming a corresponding work site number set for each work site base station based on the work site residence corresponding to each number, obtaining the work site accompanying number set corresponding to each number from the work site number set, for example, the base stations BS with the same residence site will have i,r Form a number set corresponding to each residential base station, and base stations BS with the same working place i,p The numbers are collected to form a number collection corresponding to each work foundation station; according to the working ground station BS i,p Obtaining a companion number set for each number
Figure BDA0002237981260000081
According to the residence base station BS i,r Obtaining an accompanying number set for each number
Figure BDA0002237981260000091
It should be noted that, the operation site base station and the operation site number set, and the operation site accompanying number set may be obtained first, and then the residence base station, the residence number set, and the residence accompanying number set may be obtained, or both may be obtained at the same time, which is not limited herein.
Step S44, acquiring a student number set corresponding to each school and an accompanying number set of each number based on the number sets;
specifically, a student number set corresponding to each school and an accompanying number set of each number are obtained based on the number sets, the pre-obtained campus base station data are matched with each workplace number set to form the student number set corresponding to the campus, and an accompanying number set of each student in the campus is obtained based on the student number set, for example: base station BS based on the aforementioned concrete same operation i,p Is matched with the base station associated with the school
Figure BDA0002237981260000092
Obtaining the student number set corresponding to each school
Figure BDA0002237981260000093
The school may be a university, middle school, or primary school, although this is not a limitation and preferably the school is a university or middle school. Obtaining an accompanying number set of each student number according to the student number set and an accompanying number set of each student on the campus>
Figure BDA0002237981260000094
In a preferable embodiment of this embodiment, after step S44, the method further includes:
s45, acquiring a social relationship set based on the student number set and the accompanying number set;
specifically, a corresponding social relationship set is obtained according to the student number set and the accompanying number set of each student. The social relationship binding includes: a family number set, a college number set and a friend number set;
further, step S45 is specifically:
acquiring a family and punish number set of the student based on a residence accompanying number set corresponding to the number of the student and a contact number set corresponding to the number;
for example, for each student i ∈ L school,1 Extracting the accompanying number set of the residence
Figure BDA0002237981260000095
The contact number set of the number i is then extracted
Figure BDA0002237981260000096
Acquiring an intersection between the accompanying number set and the contact number set to obtain a family and a family number set;
extracting a classmate number set of the student from a workplace number set corresponding to the number;
for example: for each student i ∈ L school,1 Extracting the classmate number set
Figure BDA0002237981260000101
Obtaining a friend number set based on the classmate number set and the contact number set corresponding to the number;
for example: for each student i ∈ L school,1 Extracting contact number set by using call relation
Figure BDA0002237981260000102
Taking the intersection between the college number set and the contact number set to obtain the friend number set F = { N { T = } i ,Com i }。
Specifically, referring to fig. 5, the step S42 specifically includes:
step S421, respectively extracting the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set from the activity record set;
specifically, the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set are respectively extracted from the activity record set, and the process of extracting the activity record of the student vacation is as follows:
from the set of active records R histroam Extracting student's holidays (cold holidays and/or summer holidays)
Figure BDA0002237981260000103
Is recorded over a predetermined time period>
Figure BDA0002237981260000104
At each->
Figure BDA0002237981260000105
Extracts all the numbers to form a number set->
Figure BDA0002237981260000106
Where n represents the student vacation code, t1 and t2 represent corresponding time periods (e.g., t1 is 7 months 10 days, t2 is 8 months 30 days), for each number i ∈ L n,hisRoam Acquiring all records corresponding to the number in the student holiday n, and dividing all records into records corresponding to working day daytime, working day night and public holiday, wherein the records are respectively as follows: />
Figure BDA0002237981260000107
Figure BDA0002237981260000108
It should be noted that the vacation time period V n Are defined relative to students and weekdays are defined for non-students in order to differentiate public holidays (e.g., weekends, legal holidays, etc.).
Step S422, obtaining a residence base station corresponding to the number based on the activity record of the student vacation;
specifically, the residence base station corresponding to the number is obtained based on the activity record of the student vacation, such as: firstly, base stations with numbers appearing in the working day, working day and night of the student holiday and in the public holiday period are respectively acquiredData, then extracting a base station with the largest number of days of occurrence (for example, extracting a base station with the largest number of times of occurrence in a time period corresponding to the daytime of the working day, extracting a base station with the largest number of times of occurrence in a time period at night of the working day, and extracting a base station with the largest number of times of occurrence in a time period of the public holiday) from the obtained base station data of the daytime of the working day, the nighttime of the working day, and the public holiday, respectively, comparing the obtained target base station data with corresponding preset threshold values to obtain corresponding comparison results, and obtaining a residential base station corresponding to the number based on the comparison results; for example: separately acquire
Figure BDA0002237981260000111
Figure BDA0002237981260000112
The base station which appears on the most days within each set in>
Figure BDA0002237981260000113
And the corresponding number of days->
Figure BDA0002237981260000114
Comparing the corresponding days with corresponding preset thresholds respectively, wherein the corresponding preset thresholds are as follows: />
Figure BDA0002237981260000115
The corresponding comparison results are obtained as follows: />
Figure BDA0002237981260000116
Wherein Res =1 (D is larger than or equal to Thre), res =0 (D is smaller than Thre), wherein Thre is a preset threshold, res is a comparison result, and D represents days; for each number i, and for all base stations which have occurred during the comparison->
Figure BDA0002237981260000117
The sum res i,j =∑ nk=1,2, 3 res n,i,k The k is a time segmentation type, and the holiday is divided into a workday day, a workday night and a public holiday, and the k values are 1,2,3 respectively. Then->
Figure BDA0002237981260000118
Wherein, BS i,r The residence base station of number i, preferably, adds to this number the frequency of residence of the associated tag, which takes the value: />
Figure BDA0002237981260000121
Wherein j is 0 Is the identified residential base station. The preset threshold value>
Figure BDA0002237981260000122
The specific value of (2) can be set according to actual conditions, but is not limited thereto, and for example, can be set according to the length of the vacation period, and can also be set according to other conditions.
Step S423, obtaining the corresponding work place base station based on the activity record of the non-vacation period of the student;
specifically, similar to the method in step S422, the corresponding work base station is obtained through the activity record of the number in the non-holiday, preferably, the activity record of the non-holiday is divided into three activity records of a working day, a working day and a night, and a public holiday, the base station with the highest frequency appearing on the campus in the three time periods is obtained based on the activity records, and then is compared with the set numerical value, and the work (school) base station is obtained according to the comparison result, and the specific implementation process is consistent with the process of obtaining the residential base station in step S422, which can refer to the above process, and the steps are described here again. The set value may be set according to actual circumstances, and is not limited herein.
After the work base station and the residential base station with the same number are acquired, the steps from the step S421 to the step S423 are executed again to acquire the work base station and the residential base station with another number until the work base station and the residential base station with each number in the campus roaming number set are acquired. It should be noted that the work base station is a school base station for students. Usually, the list of the base stations in each school is usually kept in a file at the operator, and the updated accurate data is kept through daily continuous drive tests and CQTs.
Meanwhile, the above steps are classified according to different student holidays, the main reason is that the activities of the students and the teacher staff in the three time periods are different, so that the students, not the teacher staff, need to be better identified by differentiating the threshold settings used in the three time periods.
Specifically, referring to fig. 6, the step S1312 specifically includes:
step S61, acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
in particular, a handset data stream r i,t,grid Obtaining, from a base station associated with a campus, comprising: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number set RTAccu Comprises a real-time mobile phone data stream r corresponding to each number i,t,grid ,r i,t,grid ∈R RTAccuSchRoam
Step S62, acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
specifically, position data of a user is obtained according to the obtained mobile phone data stream, and identification is carried out according to the position data to obtain a corresponding identification result;
obtaining the identification result corresponding to one number through the steps S61 and S62, and then repeating the steps S61 and S62 to obtain the identification result of another number until obtaining the identification result of each number in the campus roaming number set, thereby obtaining the student number list.
Specifically, referring to fig. 7, the step S62 specifically includes:
step S621, acquiring position data of the corresponding number in the bedtime period;
specifically, the location data includes location information of the number acquired by the corresponding base station, and since the user may change different locations in a day, there may be a plurality of grids corresponding to the locations, where the location data includes a plurality of corresponding grids (i.e., a plurality of grids in which the number appears), and the grids include grid numbers, grid center longitude and latitude, and other information.
Step S622, matching the corresponding user based on the position data appearing in the bedtime period, obtaining a first matching result;
specifically, the corresponding user is matched based on the position data appearing in the sleeping time period to obtain a first matching result, the grid where the number appears in the position in the sleeping time period is analyzed, the grid is matched with the grids of each dormitory in the campus to obtain a first matching result, and the result is the dormitory location of the number to the user.
In a preferred example of this embodiment, as shown in fig. 8, the step S622 specifically includes:
step S81, acquiring the occurrence frequency of each grid in a bedtime period;
specifically, grids appearing in a sleeping time period and corresponding appearing times are obtained according to the mobile phone data stream; this bedtime period is the bedtime that school set up, for example: 10 o 'clock at night to 7 o' clock in the morning;
step S82, selecting a preset number of grids from a plurality of grids appearing in a sleeping time period, wherein the frequency of the appearance of any selected grid in the sleeping time period is more than the frequency of the appearance of any grid which is not selected in the plurality of grids in the sleeping time period;
specifically, because the occurrence times of each grid are inconsistent, the occurrence times of each grid are sorted, a preset number of grids with a larger occurrence time are selected, the preset number can be set according to actual conditions, and the preset number is not limited herein, such as 3 or 5, for example, 10 appeared grids are provided, the occurrence times of each grid may be inconsistent, the grids are sorted according to the occurrence times from high to low, and the grids in the top five rows are selected;
s83, respectively matching the grids of each dormitory corresponding to the campus with the selected grids to obtain first matching results;
specifically, the position of each dormitory is provided with a corresponding outline polygon, the selected grating is matched with the outline polygons, whether the longitude and the latitude of the center of the grating are surrounded by the outline polygons or not is judged, and if yes, the grating is matched with the dormitory.
For ease of understanding, the identification process is described in detail below:
because the number appearing grids (more than one, and the appearing times are inconsistent) in the sleeping time period are obtained, the mobile phone signal flow record r corresponding to each number is recorded i,t,grid ∈R RTAccuSchRoam Calculating the bedtime period
Figure BDA0002237981260000141
The number of times of occurrence of each grid in the system is equal to the number i ∈ L SchRoam Obtaining a corresponding bedtime grid active vector +>
Figure BDA0002237981260000142
(u is the number of the grid), the active vector is the number i e L SchRoam Appear in grid during bedtime period u Number of occurrences, for each number i ∈ L SchRoam Taken every day>
Figure BDA0002237981260000143
Five highest-valued grid numbers G i,s (1,...,5). Now the profile polygon per dormitory is used->
Figure BDA0002237981260000144
For each G i,s Grid center longitude and latitude G i,s (gridlon i,s ,gridlat i,s ) Go intoCalculating the line inclusion relationship (whether the central longitude and latitude are in the outline polygon), if the central longitude and latitude are surrounded by the outline polygon, indicating the matching, and adding 1 to the corresponding dormitory matching number column, namely Dorm, for each successfully matched dormitory i,s =Dorm i,s +1, and if>
Figure BDA0002237981260000145
Dorm daily i,s The dormitory s with the highest corresponding value is the best judgment Dorm of the dormitory where the user corresponding to the number i is located in the same day i The Dorm may be a person who is walking, e.g. a student walking to a series of dormitories i The corresponding values may vary, and the Dorm may stabilize after a period of time (e.g., one or two weeks from study) and after a period of time (e.g., one or two weeks from study) has elapsed i As an output value, adding a label for the number i, wherein the label carries a dormitory number and a corresponding frequency, and the frequency value is->
Figure BDA0002237981260000151
s is Dorm i The corresponding dormitory number.
Step S623, acquiring position data of the corresponding number appearing in at least one courtyard activity time period;
specifically, an activity list of each department in the campus is obtained in advance, and the activity list comprises: information such as activity time, grids, activity contents, a holding department and the like, and position data (such as grids, the number of occurrences and the like) of the number is obtained based on the mobile phone signal flow and the activity time obtained by each department;
step S624, matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
specifically, the obtained grids are respectively matched with the grids in which the department of each activity list is located, and a corresponding second matching result is obtained. The second matching result is the hospital system of the user corresponding to the number; for example, the grid where the number appears is matched with the corresponding courtyard outline polygon to obtain a corresponding second matching result.
Preferably, the step S624 specifically includes:
acquiring raster data of corresponding numbers appearing in each courtyard activity time period based on the position data;
specifically, obtaining raster data appearing in each courtyard activity time period based on position data, wherein the raster data comprises a raster and corresponding occurrence times;
matching the grids with the largest occurrence number with the positions of the corresponding hospital systems to obtain a second matching result;
specifically, as more than one grid appears in the activity time and the appearance times are inconsistent, the appearing grids are sequenced according to the appearance times to obtain the grid with the most appearance times, so that the grids with the most appearance times corresponding to each hospital department activity are obtained, and then the grids with the most appearance times are matched with the positions of the corresponding hospital departments to obtain the hospital department of the number where the user is located.
For ease of understanding, the identification process is described in detail below:
obtaining a campus activity plan comprising a plurality of hospitality activity lists Act (ActName) h ,t h,1 ,t h,2 ,Dept h ) At each active time period (t) h,1 ,t h,2 ) In each number i epsilon L SchRoam Calculating each grid griId it has appeared during the period and the number of occurrences, activating for each number i the grid G that is most active during activity h i,h Courtyard outline polygon using activity h correspondences
Figure BDA0002237981260000161
For G i,h The grid center longitude and latitude->
Figure BDA0002237981260000162
Calculating the inclusion relation, and judging G i,h Whether or not it is surrounded by a courtyard outline polygon), and also by calculating G i,h Distance G from center point of polygon of courtyard system contour i,h Judging whether the polygon is located by the difference between the maximum distances between the edges of the courtyard outline polygonThe outline polygon surrounds (the difference value is more than 0, the outline polygon is indicated to be outside, otherwise, the outline polygon is not outside), if the outline polygon surrounds, the matching is successful, and each successfully matched activity is to add 1 to the matching sequence of the hospital system, namely
Figure BDA0002237981260000163
And->
Figure BDA0002237981260000164
Based on the weight of the patient on a daily basis>
Figure BDA0002237981260000165
Highest value hospital grade dept h I.e. the best judgment Dept of the family of the user corresponding to the number i i The value of Dept i Repeated at the beginning of a period of time (e.g., one week), but stabilized after a week because the student's activities stabilized after a period of time of admission, the number i is tagged with a label carrying a hospital index, a frequency, etc., with the frequency being
Figure BDA0002237981260000166
dept h Is Dept i The yard is the number.
Step S625, obtaining an identification result based on the first matching result and the second matching result;
specifically, the identification result of the number is obtained by combining the first matching result and the second matching result, for example, the dormitory where the user corresponding to the number is located is obtained according to the first matching result, the institution where the user is located is obtained by combining the second matching result, whether the user is a student is determined, and the corresponding result is output.
In a preferable scheme of this embodiment, after step S625, the method further includes:
acquiring a companion number set of students based on the campus roaming number set;
specifically, after the number is matched with an institution and a dormitory, the user corresponding to the number is identified as a student, and then a matching number set corresponding to the student is obtained based on the campus roaming number set;
further, each number i e L is sorted SchRoam Its associated value Dorm i And Dept i All being not 0, if i belongs to L School,2 Other numbers in the campus roaming number set form a companion number set
Figure BDA0002237981260000171
At the same time, for each number i ∈ L SchRoam Corresponding Dorm i And Dept i And dormitory frequency>
Figure BDA0002237981260000172
The hospital is frequently on or off>
Figure BDA0002237981260000173
Two values are output as dormitory and institution labels of number i, i belongs to L School,2 Is corresponding to the number of>
Figure BDA0002237981260000176
And & ->
Figure BDA0002237981260000174
The values are all 0.
Specifically, referring to fig. 9, the step S1313 specifically includes:
step S91, acquiring a mobile phone data stream set from a base station associated with a campus;
specifically, a real-time mobile phone data flow set R is acquired from a base station associated with a campus RT The mobile phone data stream comprises a plurality of number data stream records, and each data stream record comprises corresponding behavior data;
step S92, matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
specifically, a mobile phone data stream set L which does not belong to the campus roaming number set is firstly selected SchRoam Removing the records corresponding to other numbers except the number in the mobile phone, and collecting the mobile phone data stream and the campus roaming numberMatching sets of codes, e.g. sets L of school roaming numbers SchRoam Comparing each number in the list with the mobile phone data stream set to obtain a campus roaming number set L SchRoam The mobile phone data stream data corresponding to each number in the mobile phone data stream subset R is obtained RTSchRoam
Step S93, acquiring behavior data corresponding to a number from the mobile phone data stream subset;
in particular from R RTSchRoam Obtaining each number i belongs to L SchRoam The mobile phone data stream data
Figure BDA0002237981260000175
And (5) performing behavior data coding on each number i based on mobile phone data flow data, and extracting mobile phone browsing record r i,t,loc,url The mobile phone browsing record comprises: the number i belongs to L SchRoam Recording time t (occurrence time), recording base station bsid, base station longitude and latitude (lon/lat), accessing page data (pageTypeId), using APP data (APP-appId), call/short message identification actType (call, SMS, other), call/short message opposite terminal number comNum, and incoming/outgoing call identification direction (in/out).
Step S94, substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
specifically, a two-dimensional matrix is established in advance, and the establishment process of the two-dimensional matrix is as follows: arranging all access page data (APPid) and using APP data (pageTypeId) according to an axis sequence (the arrangement is related to the encoding modes of the pageTypeId and the APPid), arranging the recording base station bsid according to another axis, and constructing a two-dimensional matrix of the using behavior of the user with the number i, wherein the expression is as follows:
Figure BDA0002237981260000181
substituting the behavior data into the two-dimensional matrix to form an L belonging to each number i SchRoam Constructing a corresponding two-dimensional array:
Figure BDA0002237981260000182
where a row (1 … c) indicates c sorted appids and pageTypeId, and a column (1 … m) indicates m sorted recording base stations bsid, h>
Figure BDA0002237981260000183
The number i indicates the number of times that the user uses alpha applications or browses alpha web content categories under the beta base station.
Step S95, performing column normalization processing on the behavior matrix to obtain a processed behavior matrix;
in particular, act is a matrix for each behavior i Performing column normalization to obtain processed row matrix
Figure BDA0002237981260000184
Step S96, calculating corresponding behavior correlation values based on the processed behavior matrix and the standard student behavior matrix;
specifically, during the start of a school, each school will develop a campus marketing campaign that confirms the true identity of a portion of the numbers from which the student list L is derived student And a parental list L family Inputting the two lists into a two-dimensional matrix for training a student identification model, and matching a school yard roaming number set with the student list and a parent list to obtain an uncertain identity number set of the school yard
Figure BDA0002237981260000191
Based on the student list i belongs to L student And the current processed behavior matrix
Figure BDA0002237981260000192
Establishing a student behavior model: />
Figure BDA0002237981260000193
Performing column normalization processing on the behavior matrix model to obtain standard student rowsIs a model matrix->
Figure BDA0002237981260000194
Then, an uncertain (to be identified) identity number i epsilon L is calculated undefined Calculating the behavior correlation value of the student with the standard student, and processing the behavior matrix according to the number
Figure BDA0002237981260000195
Processed behavior matrix based on the standard student->
Figure BDA0002237981260000196
Calculating a behavior correlation value as follows:
Figure BDA0002237981260000197
step S97, comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
in particular, for all student numbers i' e L for which identities have been confirmed student Obtaining each pre-calculated number i' epsilon L student Corresponding behavior related value
Figure BDA0002237981260000201
(in a manner consistent with the foregoing); get the
Figure BDA0002237981260000202
By r cutoff As a standard for confirming the numbers of students to be confirmed, comparing the behavior related value corresponding to the numbers to be identified with the confirmation standard, identifying whether the user corresponding to the numbers is a student according to the comparison result, obtaining the identification result corresponding to each number, and finally obtaining a student list and an accompanying person list, wherein the student list specifically comprises: />
Figure BDA0002237981260000203
The accompanying staff columnIs shown as
Figure BDA0002237981260000204
Then, each recognized student number is tagged with a behavior label, i.e. ->
Figure BDA0002237981260000205
Specifically, the step S132 specifically includes: inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list;
further, the merged model is a two-layer neural network model, a first layer neural network and a second layer neural network, the first layer neural network includes three neurons, the second layer neural network includes two neurons, the first layer neural network receives the three recognition results, and specifically:
Figure BDA0002237981260000206
the two neurons of the second layer comprise two weighting matrices, respectively: />
Figure BDA0002237981260000211
The merging model comprises the following structure:
Figure BDA0002237981260000212
Figure BDA0002237981260000213
Figure BDA0002237981260000214
wherein in is individual and is the number of a certain individual, n is the sample size, and k is the number of the neuron in the first layer network; w is the weight value of each neuron, aThe output of the neuron, b is two layers of offset vectors, which belong to the standard neuron operation calculation set value,
Figure BDA0002237981260000215
for the activation function of any neuron, z is a parameter for adjusting the shape of the activation function σ, and belongs to the setting value of the standard neural network activation function, the cost function (cost function) is:
Figure BDA0002237981260000216
Figure BDA0002237981260000217
for the identity of the input function (i.e., the user of number i) confirmed in the marketing campaign, based on the number of the user in the marketing campaign, the system determines whether the input function is a valid input function>
Figure BDA0002237981260000218
Respectively, the offset vector is a vector of the offset,
Figure BDA0002237981260000221
three recognition results are input into three neurons of a first layer of neural network to be subjected to judgment>
Figure BDA0002237981260000222
The result is output by the first layer of neural network to the second layer of neural network for learning training, and the result is the identity of the student or the accompanied parent, if the marketing activity is confirmed to be the student, the component of the vector is _ student =1, if the marketing activity is confirmed to be the accompanied parent, the component of the vector is _ family =1, and if no marketing activity confirmation information exists, the two are both 0 (the sample is abandoned); if both marketing campaign feedbacks confirm, both are 1, (this sample also needs to be aborted), if the outcome is output a £ or £ greater>
Figure BDA0002237981260000223
The cutoff for both component decisions corresponds to a value of 0.5.
In this embodiment, the training data of the merged model is derived from the confirmed student list and the confirmed parent list fed back by the marketing campaign, a backward propagation method is used for training the model, the student list and the parent list are updated every day during the beginning of each year, the merged model is retrained every day by using the data of the student list and the parent list to obtain an updated merged model, and if more basic data are trained and learned in the model, the recognition scientificity of the model can be improved.
The merging process is as follows:
for each unidentified number
Figure BDA0002237981260000224
The corresponding three recognition results are led into the merging model for training and learning, the recognition is recalculated, and the output result is greater than or equal to the preset value>
Figure BDA0002237981260000225
Forming a list of presumed students based on the outcome>
Figure BDA0002237981260000226
And guess parental lists
Figure BDA0002237981260000231
However, in both of the above two guess lists, there may be one number in both the guess student list and the guess parent list.
After obtaining the guess list, it is necessary to confirm the tag of each number in the list, for example, it is necessary to compare the three recognition results with the result obtained by inputting the three recognition results into the merging model, and selectively output the tags of the three recognition results to the final user tag, which includes the following specific processes:
confirmation of the companion tag:
Figure BDA0002237981260000232
retaining its companion tag; />
Figure BDA0002237981260000233
&i∈L School,1 Then the companion tag is temporarily retained, i.e., the companion tag is not retained in the list, but stored for later use;
the identification of the dormitory and department tags,
Figure BDA0002237981260000234
keeping the labels of the courtyard and dormitory; />
Figure BDA0002237981260000235
&i∈L School,2 Temporarily not using any tag thereof, i.e. not keeping the accompanying tag in the list, but storing it for later use;
after the student list and the parent list confirmed after the marketing activity feedback are updated, the corresponding labels need to be confirmed again for the numbers I newly added in all the lists;
as with the validation of the tag:
Figure BDA0002237981260000236
retaining its accompanying label;
Figure BDA0002237981260000237
discarding the corresponding label;
and (3) confirmation of dormitory and institution labels:
Figure BDA0002237981260000238
the labels of dormitories and institutions are reserved,
Figure BDA0002237981260000239
discarding the corresponding label;
in this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on activity records of users of mobile phone numbers, real-time mobile phone data streams, user behavior data, and the like, and a student number list is output, so that the accuracy of student number identification can be improved.
In a second embodiment of the present invention, as shown in fig. 10, a flow chart of a student identification method provided in the embodiment of the present invention is shown, where the student identification method includes:
step S101, acquiring a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, and in the captured numbers, campus resident numbers are removed, the remaining numbers serve as a campus roaming number set, the time of the numbers appearing on the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the numbers captured by the campus-associated base station within one year before the day before the start of study are obtained, the numbers which appear for half a year or more are removed, the remaining numbers serve as the campus roaming number set, the campus resident numbers refer to the numbers existing in the associated base station for more than half a year, and the numbers can be regarded as resident persons in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S102, acquiring attribute data of a plurality of roaming numbers;
specifically, a campus roaming number set L is obtained SchRoam The attribute data of each number in the data base includes, for example, an activity record of a user corresponding to each roaming number in a preset time period, that is, the attribute data may be an activity record of a user corresponding to each roaming number in a preset time period, and further, the campus roaming number set in the last year may be extracted from a big data platformL SchRoam Active record set R of all numbers in (1) histRoam The activity record r corresponding to each number in the activity record set i,t,bs Comprises the following steps: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identifier actType (call, SMS, other), call/short message opposite terminal number conNum, and incoming/outgoing call identifier direction (in/out).
Step S103, identifying a student number from the campus roaming number set based on the activity record of a preset time period, and outputting an identification result based on the activity record;
specifically, each number in the campus roaming number set is identified based on the attribute data to obtain a corresponding identification result, where the identification result includes at least a student number list, and may also include a parent number list and an uncertain list, which are not limited herein.
In this embodiment, a campus roaming number set is first obtained, and the accuracy of number identification is recorded based on the activity of a mobile phone number user.
In this embodiment, referring to fig. 11, the step S103 specifically includes:
step S111, acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
specifically, the campus roaming number set L of the campus in the last year is firstly extracted from a big data platform SchRoam Active record set R of all numbers in histRoam The activity record r corresponding to each number in the activity record set i,t,bs The method comprises the following steps: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), and collecting the activity records of each number i into an activity record set R histRoam
Step S112, acquiring base station data corresponding to each number based on the activity record set;
specifically, divide intoAnalyzing each activity record, identifying base station data of the corresponding number, e.g. identifying the number corresponding to the user's residence base station BS i,r And a work site base station BS i,p (for students, the work base station is a school base station), namely, the position of the residence and the position of the work place of the user are identified;
step S113, acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of the corresponding base station;
specifically, after obtaining the work site base station and the residence base station of the user corresponding to each number, acquiring a number set appearing in the same base station based on the work site base and the residence base station, respectively, the number set including a target number set and an accompanying number set, the target number set including each residence base station number set and each work site base station number set, forming a corresponding residence number set for each residence base based on the residence base station corresponding to each number, obtaining the residence accompanying number set corresponding to each number from the acquired residence number set, then forming a corresponding work site number set for each work site base station based on the work site base station corresponding to each number, obtaining the work site accompanying number set corresponding to each number from the work site number set, for example, the base station BS with the same residence and the residence base station BS i,r Form a number set corresponding to each residential base station, and base stations BS with the same working place i,p The numbers are collected to form a number collection corresponding to each work foundation station; according to the working ground station BS i,p Obtaining an accompanying number set for each number
Figure RE-GDA0002280497990000261
According to the residence base station BS i,r Obtaining an accompanying number set for each number
Figure RE-GDA0002280497990000262
It should be noted that, the base station and the number set of the work place may be obtained first, and then the number set of the work place may be obtainedThe number set of the residential site, and the number set of the residential site may be performed simultaneously, which is not limited herein.
Step S114, acquiring a student number set corresponding to each school and an accompanying number set of each number based on the number sets;
specifically, a student number set corresponding to each school and a companion number set of each number are obtained based on a number set, campus base station data obtained in advance are matched with each work place number set to form a student number set corresponding to the campus, and a companion number set of each student on the campus is obtained based on the student number set, for example: base station BS based on the aforementioned concrete same operation i,p Is matched with the base station associated with the school
Figure BDA0002237981260000263
Obtaining the student number set corresponding to each school
Figure BDA0002237981260000264
The school may be a university, middle school, or primary school, although this is not a limitation and preferably the school is a university or middle school. Obtaining an accompanying number set of each student number according to the student number set and an accompanying number set of each student on the campus>
Figure BDA0002237981260000265
Step S115, acquiring a social relationship set based on the student number set and the accompanying number set;
specifically, a corresponding social relationship set is obtained according to the student number set and the accompanying number set of each student. The social relationship binding includes: a family number set, a college number set and a friend number set;
it should be noted that, steps S111 to S115 in this embodiment are the same as steps S41 to S45 shown in fig. 4 in the first preferred embodiment, and the implementation processes and technical effects of the steps are the same, and specific reference is made to the above description, which is not described herein again.
In a third embodiment of the present invention, as shown in fig. 12, a flow chart of a student identification method provided in an embodiment of the present invention is shown, where the student identification method includes:
step S121, acquiring a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may be associated with multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S122, acquiring attribute data of the plurality of roaming numbers;
specifically, a campus roaming number set L is obtained SchRoam The attribute data includes the real-time mobile phone data stream of the user, the real-time mobile phone data stream r of the user i,t,grid Obtaining, from a base station associated with a campus, comprising: the number i belongs to L SchRoam Recording the time tRecording a base station bs, base station longitude and latitude (lon/lat), a precise positioning grid number grid, grid center longitude and latitude (grid/grid), a call/short message identification actType (call, SMS, other), a call/short message opposite end number conNum and an incoming/outgoing call identification direction (in/out);
step S123, identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
specifically, each number in the campus roaming number set is identified based on the real-time mobile phone data stream, so as to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which are not limited herein.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on a real-time mobile phone data stream, and a student number list is output, so that the accuracy of student number identification can be improved.
Referring to fig. 13, the step S123 specifically includes:
step S1301, acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
step S1302, acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
obtaining an identification result corresponding to one number through the steps S1301 and S1302, and then repeating the steps S1301 and S1302 to obtain an identification result of another number until obtaining an identification result of each number in the campus roaming number set, thereby obtaining a student number list.
It should be noted that, in this embodiment, the specific processes of step S1301 and step S1302 are the same as the specific implementation processes of step S61 and step S62 in the first preferred embodiment, and specific reference may be made to the description of the above embodiments, and details are not described here again.
In a fourth embodiment of the present invention, as shown in fig. 14, a flow chart of a student identification method provided in an embodiment of the present invention is shown, where the student identification method includes:
step S141, a campus roaming number set is obtained.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, and in the captured numbers, campus resident numbers are removed, the remaining numbers serve as a campus roaming number set, the time of the numbers appearing on the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the numbers captured by the campus-associated base station within one year before the day before the start of study are obtained, the numbers which appear for half a year or more are removed, the remaining numbers serve as the campus roaming number set, the campus resident numbers refer to the numbers existing in the associated base station for more than half a year, and the numbers can be regarded as resident persons in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may be associated with multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S142, acquiring attribute data of the plurality of roaming numbers;
specifically, a campus roaming number set L is obtained SchRoam The attribute data of each number in the campus network comprises behavior data of the user, and the behavior data come from a real-time mobile phone data stream R of the campus associated base station RT Real-time mobile phone data stream R RT And campus roaming number set L SchRoam Matching is carried out, numbers which do not belong to the campus roaming number set are removed, and the obtained mobile phone data stream R RTSchoam The mobile phone data stream R is used RTSchoam Sorting by each number iSummarizing to obtain the i-E L of each number SchRoam Accumulated short-term mobile phone data stream R in campus i,RTSchoam When obtaining the mobile phone data stream R of each user i i,RTSchoam Then, the behavior data coding is carried out on each mobile phone number, and the mobile phone data stream R of the user i is extracted i,RTSchoam Mobile phone browsing record r i,t,loc,url The method comprises the following steps: i is as large as L SchRoam The method comprises the steps of recording time t, recording base station bsid, base station longitude and latitude (lon/lat), accessing page classification (pageTypeId), used APP- (appId), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum and the like.
Step S143, identifying numbers of students from the campus roaming number set based on the behavior data, and obtaining identification results based on the behavior data;
specifically, the behavior data acquired from the base station is identified with the campus roaming number set to obtain the identification result of each number, the identification results of all numbers form the identification result based on the behavior data, and the identification result comprises a student number list and an accompanying number list of each student number.
In this embodiment, the campus roaming number set is first obtained, each number in the campus roaming number set is identified based on the behavior data of the user, and a student number list is output, so that the accuracy of student number identification can be improved.
Specifically, referring to fig. 15, the step S143 specifically includes:
step S151, acquiring a mobile phone data stream set from a base station associated with a campus;
specifically, a real-time mobile phone data flow set R is acquired from a base station associated with a campus RT The mobile phone data stream comprises data stream records of a plurality of numbers, and each data stream record comprises corresponding behavior data;
step S152, matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to a school garden;
specifically, a mobile phone data stream set L which does not belong to the campus roaming number set is firstly selected SchRoam The records corresponding to other numbers except the number in the list are removed, and the mobile phone data stream set is matched with the campus roaming number set, for example, the campus roaming number set L is removed SchRoam Each number in the list is compared with a mobile phone data stream set to obtain a campus roaming number set L SchRoam The mobile phone data stream data corresponding to each number in the mobile phone data stream subset R is obtained RSchRoamT
Step S153, acquiring behavior data corresponding to a number from the mobile phone data stream subset;
in particular, from R RSchRoamT Obtaining each number i belongs to L SchRoam Data flow R of mobile phone data i,RSchRoamT Performing behavior data coding on each number i based on mobile phone data flow data, and extracting mobile phone browsing record r i,t,loc,url The mobile phone browsing record comprises: the number i belongs to L SchRoam Recording time t (occurrence time), recording base station bsid, base station longitude and latitude (lon/lat), accessing page data (pageTypeId), using APP data (APP-appId), call/short message identification actType (call, SMS, other), call/short message opposite terminal number comNum, and incoming/outgoing call identification direction (in/out).
Step S154, substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
specifically, a two-dimensional matrix is established in advance, and the establishment process of the two-dimensional matrix is as follows: arranging all access page data (APPid) and using APP data (pageTypeId) according to an axis sequence (the arrangement is related to the encoding modes of the pageTypeId and the APPid), arranging the recording base station bsid according to another axis, and constructing a two-dimensional matrix of the using behavior of the user with the number i, wherein the expression is as follows:
Figure BDA0002237981260000302
substituting the behavior data into the two-dimensional matrix to form an L belonging to each number i SchRoam Constructing a corresponding two-dimensional array:
Figure BDA0002237981260000303
wherein, the row (1 … n) represents p sorted appids and pagetypeids, and the column (1 … m) represents m sorted recording base stations bsid, h |, h |, and>
Figure BDA0002237981260000304
the number of times that the user of the number i uses the alpha application or browses the alpha web content classification under the beta base station is indicated.
Step S155, normalizing the behavior matrix to obtain a processed behavior matrix;
in particular, act is a matrix for each behavior i Performing column normalization to obtain a processed behavior matrix
Figure BDA0002237981260000311
Step S156, calculating corresponding behavior correlation values based on the processed behavior matrix and the standard student behavior matrix;
specifically, during the start of a school, each school will develop a campus marketing campaign that confirms the true identity of a portion of the numbers from which the student list L is derived student And a parental list L family Inputting the two lists into a two-dimensional matrix for training a student identification model, and matching a school yard roaming number set with the student list and a parent list to obtain an uncertain identity number set of the school yard
Figure BDA0002237981260000312
Based on the student list i belongs to L student And the current processed behavior matrix
Figure BDA0002237981260000313
Establishing a student behavior model: />
Figure BDA0002237981260000314
The behavior matrix model is subjected to column normalization processing,obtaining a standard student behavior model matrix>
Figure BDA0002237981260000315
Then, an uncertain (to be identified) identity number i epsilon L is calculated undefined Calculating the behavior correlation value of the student with the standard student, and processing the behavior matrix according to the number
Figure BDA0002237981260000316
Processed behavior matrix with standard students +>
Figure BDA0002237981260000317
Calculating a behavior correlation value as follows:
Figure BDA0002237981260000321
step S157, comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
in particular, for all student numbers i' e L with confirmed identities student Obtaining each pre-calculated number i' epsilon L student Corresponding behavior related value
Figure BDA0002237981260000322
(in a manner consistent with the foregoing); get the
Figure BDA0002237981260000323
By r cutoff As a standard for confirming the numbers of students to be confirmed, comparing the behavior related value corresponding to the numbers to be identified with the confirmation standard, identifying whether the user corresponding to the numbers is a student according to the comparison result, obtaining the identification result corresponding to each number, and finally obtaining a student list and an accompanying person list, wherein the student list specifically comprises: />
Figure BDA0002237981260000324
The list of accompanying persons is
Figure BDA0002237981260000325
Then each recognized student number is associated with a behavior tag, i.e. </>>
Figure BDA0002237981260000326
It should be noted that the specific implementation process of step S151 to step S157 in this embodiment is the same as the specific process described in the embodiment corresponding to fig. 9, and is not described herein again, and reference may be made to the above description.
In this embodiment, the campus roaming number set is first obtained, each number in the campus roaming number set is identified based on user behavior data, and a student number list is output, so that the accuracy of student number identification can be improved.
Fig. 16 is a schematic structural diagram of a student identification device according to a fifth embodiment of the present invention, and as shown in fig. 16, the device includes: the number acquisition module 161, the data acquisition module 162 connected with the number acquisition module 161, and the identification module 163 connected with the data acquisition module 162, wherein:
the number obtaining module 161 is configured to obtain a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, and in the captured numbers, campus resident numbers are removed, the remaining numbers serve as a campus roaming number set, the time of the numbers appearing on the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the numbers captured by the campus-associated base station within one year before the day before the start of study are obtained, the numbers which appear for half a year or more are removed, the remaining numbers serve as the campus roaming number set, the campus resident numbers refer to the numbers existing in the associated base station for more than half a year, and the numbers can be regarded as resident persons in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
A data obtaining module 162, configured to obtain attribute data of the multiple roaming numbers;
specifically, a campus roaming number set L is obtained SchRoam Attribute data for each number in the set, the attribute data may include one or more of: recording the activity of the user corresponding to each roaming number in a preset time period, and carrying out real-time mobile phone data flow and behavior data of the user; that is, the attribute data may be one of the activity record of the user in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user corresponding to each roaming number, or may include three of the activity record of the user in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user corresponding to each roaming number at the same time, which is not limited herein. Further, the campus roaming number set L of the last year can be extracted from the big data platform SchRoam Active record set R of all numbers in histRoam The activity record r corresponding to each number in the activity record set i,t,bs The method comprises the following steps: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identifier actType (call, SMS, other), call/short message opposite terminal number conNum, and incoming/outgoing call identifier direction (in/out). The user's real-time handset data stream r i,t,grid Obtaining, from a base station associated with a campus, comprising: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (gridlon/gridlat), call/short message identification actType (call, SMS,other), the number conNum of the opposite end of the call/short message, the incoming/outgoing identification direction (in/out), the real-time mobile phone data stream R corresponding to the campus roaming number set RTAccu Comprises a real-time mobile phone data stream r corresponding to each number i,t,grid . The behavior data come from a real-time mobile phone data stream R of a campus associated base station RT Real-time mobile phone data stream R RT And campus roaming number set L SchRoam Matching is carried out, numbers which do not belong to the campus roaming number set are removed, and the obtained mobile phone data stream R RTSchoam The mobile phone data stream R is used RTSchoam Classifying and summarizing according to each number i to obtain the user i belonging to the L of each number SchRoam The accumulated short-term mobile phone data stream R in the campus i,RTSchoam When obtaining the mobile phone data stream R of each user i i,RTSchoam Then, carrying out data coding on each mobile phone number, and extracting the mobile phone data stream R of the user i i,RTSchoam Mobile phone browsing record r i,t,loc,url The method comprises the following steps: i is as large as L SchRoam Recording time t, recording base station bsid, base station longitude and latitude (lon/lat), access page classification (pageTypeId), used APP- (appId), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum, and the like.
The identification module 163 is used for identifying students from the campus roaming number set based on the attribute data and outputting identification results;
specifically, each number in the campus roaming number set is identified based on the attribute data to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which is not limited herein.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In a preferred scheme of this embodiment, the apparatus presets a plurality of sets of databases, including: a middle school zone database (such as campus base station data), a college school zone building database (such as dormitory and institution), a confirmation identity database (such as identification number).
In an alternative form, the attribute data includes: the identification module 163 specifically includes, for each user corresponding to the roaming number, an activity record of the user in a preset time period, a real-time mobile phone data stream, and behavior data of the corresponding user: identification element and merging unit, wherein: the identification unit specifically comprises: long-term identification model, activity matching model and action learning model, this merging unit includes the merging models who all connects with long-term identification model, activity matching model and action learning model, wherein:
the long-term identification model is used for identifying numbers of students from the campus roaming number set based on the activity records in the preset time period to obtain an identification result based on the activity records;
specifically, numbers of students are identified from a campus roaming number set based on activity records of each number in a preset time period, and a student number list based on the activity records is obtained; and executing the identification operation on each number in the campus roaming number set to obtain a corresponding identification result, wherein all the identification results form an identification list, and the identification list comprises a student number identification list and an accompanying number list of each number and the like. The specific value of the preset time period may be set according to the actual situation, and is not limited herein, for example: one or half a year, or two years, etc. Preferably, the preset period of time is one year. The data that the model needs to prepare includes: drawing a middle school campus, establishing data blocks of the middle school campus, and establishing a base station data list covering each campus;
the activity matching model is used for identifying the number of the student from the campus roaming number set based on the real-time mobile phone data stream to obtain an identification result based on the mobile phone data stream;
specifically, the identification result of each number is obtained by identifying the real-time mobile phone data stream acquired from the base station and the campus roaming number set, and the identification results of all numbers form the identification result based on the mobile phone data stream, wherein the identification result comprises a student number list and an accompanying number list of each student number. The model needs to be subjected to building mapping of colleges and universities to form data of all college campus buildings, establish a corresponding indoor base station data list covering each building and store all activity records during the study period in all college campuses;
the behavior learning model is used for identifying numbers of students from the campus roaming number set based on behavior data to obtain identification results based on the behavior data;
specifically, the behavior data acquired from the base station is identified with the campus roaming number set to obtain the identification result of each number, the identification results of all numbers form the identification result based on the behavior data, and the identification result comprises a student number list and an accompanying number list of each student number. The learning model needs to transmit the information of the confirmed identity in real time;
the merging model is used for inputting the obtained recognition result based on the activity record, the recognition result based on the mobile phone data stream and the recognition result based on the behavior data into the merging model for merging, outputting the student recognition result, obtaining a final student number list, and updating the current student number list, and in addition, the student recognition result can also comprise: a list of parent numbers.
Preferably, the long-term identification model is specifically used for:
acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
specifically, the campus roaming number set L of the campus in the last year is firstly extracted from a big data platform SchRoam Active record set R of all numbers in histRoam The activity record r corresponding to each number in the activity record set i,t,bs The method comprises the following steps: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, other), callThe number conNum of the opposite end of the short message, the incoming/outgoing identification direction (in/out), and the activity records of each number i are assembled into an activity record set R histRoam
Acquiring base station data corresponding to each number based on the activity record set;
in particular, each activity record is analyzed to identify base station data for the corresponding number, e.g., to identify the number corresponding to the subscriber's residence base station BS i,r And a work site base station BS i,p (for students, the work place base station is the school base station), namely, the position of the residence place and the position of the work place of the user are identified;
acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of the corresponding base station;
specifically, after obtaining the work site base station and the residence base station of the user corresponding to each number, acquiring a number set appearing in the same base station based on the work site base station and the residence base station, respectively, the number set including a target number set and an accompanying number set, the target number set including each residence base station number set and each work site base station number set, forming a corresponding residence number set for each residence base station based on the residence base station corresponding to each number, obtaining the residence accompanying number set corresponding to each number from the acquired residence number set, then forming a corresponding work site number set for each work site base station based on the work site residence corresponding to each number, obtaining the work site accompanying number set corresponding to each number from the work site number set, for example, the base stations BS with the same residence site will have i,r Form a number set corresponding to each residential base station, and base stations BS with the same working place i,p The numbers are collected to form a number collection corresponding to each work foundation station; according to the work base station BS i,p Obtaining a companion number set for each number
Figure BDA0002237981260000361
According to the residence base station BS i,r Obtaining each numberCompanion number set of
Figure BDA0002237981260000362
It should be noted that, the operation site base station and the operation site number set, and the operation site accompanying number set may be obtained first, and then the residence base station, the residence number set, and the residence accompanying number set may be obtained, or both may be obtained at the same time, which is not limited herein.
Acquiring a student number set corresponding to each school and an accompanying number set of each number based on the number sets;
specifically, a student number set corresponding to each school and a companion number set of each number are obtained based on a number set, campus base station data obtained in advance are matched with each work place number set to form a student number set corresponding to the campus, and a companion number set of each student on the campus is obtained based on the student number set, for example: base station BS based on the aforementioned concrete same operation i,p Is matched with the base station associated with the school
Figure BDA0002237981260000371
Obtaining the student number set corresponding to each school
Figure BDA0002237981260000372
The school may be a university, a middle school, or an elementary school, although this is not limiting, and preferably the school is a college or a middle school. Obtaining an accompanying number set of each student number according to the student number set and an accompanying number set of each student on the campus>
Figure BDA0002237981260000373
In a preferred aspect of this embodiment, the method is further configured to:
acquiring a social relationship set based on the student number set and the accompanying number set;
specifically, a corresponding social relationship set is obtained according to the student number set and the accompanying number set of each student. The social relationship binding includes: a family number set, a college number set and a friend number set;
in a further preferred embodiment of this embodiment, a specific implementation process for acquiring base station data corresponding to each number based on the active record set is as follows:
respectively extracting the activity record of the student vacation period and the activity record of the student non-vacation period corresponding to one number in the campus roaming number set from the activity record set;
specifically, the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set are respectively extracted from the activity record set, and the activity record process of the student vacation is extracted as follows:
from the set of active records R histroam Extracting the holidays (cold holidays and/or summer holidays) of the students
Figure BDA0002237981260000374
Is recorded over a predetermined time period>
Figure BDA0002237981260000375
At each->
Figure BDA0002237981260000376
In the system, all numbers are extracted to form a number set>
Figure BDA0002237981260000377
For each number->
Figure BDA0002237981260000378
All records corresponding to the number in the student holiday are acquired and divided into the records corresponding to the working day, the working day and the public holiday, and the records are respectively as follows: />
Figure BDA0002237981260000381
Note that the vacation time period V n Are defined relative to students, weekdays are defined for non-students,to differentiate public holidays (e.g., weekends, legal holidays, etc.).
Obtaining a residence base station corresponding to the number based on the activity record of the student vacation;
specifically, the residence base station corresponding to the number is obtained based on the activity record of the student vacation, such as: firstly, respectively acquiring base station data of numbers appearing in working day days, working day nights and public holidays of students, then respectively extracting a base station with the largest number of days of appearance from the acquired base station data of the working day days, the working day nights and the public holidays to obtain target base station data corresponding to the working day days, the working day nights and the public holidays, respectively comparing the obtained target base station data with corresponding preset threshold values to obtain corresponding comparison results, and obtaining a residential base station corresponding to the numbers based on the comparison results; for example: separately acquire
Figure BDA0002237981260000382
In each set for the most number of base stations present in>
Figure BDA0002237981260000383
And corresponding days
Figure BDA0002237981260000384
Comparing the corresponding days with corresponding preset thresholds respectively, wherein the corresponding preset thresholds are as follows: />
Figure BDA0002237981260000385
The corresponding comparison results are obtained as follows: />
Figure BDA0002237981260000386
Wherein res =1 (D is more than or equal to Thre)
Res =0 (D < Thre), where Thre is a preset threshold, res is the comparison result, and D represents the number of days; for each number i, and all base stations that occurred during the comparison
Figure BDA0002237981260000387
The sum pick>
Figure BDA0002237981260000388
Then->
Figure BDA0002237981260000389
Wherein, BS i,r The residence base station with number i preferably adds to this number the frequency of residence of the associated tag, which is: />
Figure BDA0002237981260000391
The preset threshold value is->
Figure BDA0002237981260000392
The specific value of (a) may be set according to actual conditions, but is not limited thereto, and may be set according to the length of the vacation, for example, or according to other conditions.
Obtaining a corresponding work place base station based on the activity record of the non-vacation period of the student;
specifically, similar to the acquiring process of the residential base station, the corresponding work base station is obtained through the non-holiday activity record of the number, preferably, the non-holiday activity record is divided into three parts, namely, activity records of workday day, workday night and public holiday, the base station with the highest frequency appearing on the campus in the three parts of time is obtained respectively based on the activity records, then the obtained base stations are compared with the set values respectively, and the work (school) base station is obtained according to the comparison result. The set value may be set according to the actual situation, and is not limited herein.
And after the work place base station and the residence base station with the same number are obtained, the obtaining of the work place base station and the residence base station corresponding to the user with the next number is restarted until the work place base station and the residence base station of each number in the campus roaming number set are obtained. It should be noted that the work base station is a school base station for students.
In a further preferred embodiment of this embodiment, a specific implementation process of obtaining the social relationship set based on the student number set and the companion number set is as follows:
acquiring a family and punish number set of the student based on a residence accompanying number set corresponding to the number of the student and a contact number set corresponding to the number;
for example, for each student i ∈ L school,1 Extracting the accompanying number set of the residence
Figure BDA0002237981260000393
The contact number set of the number i is then extracted
Figure BDA0002237981260000394
Acquiring an intersection between the accompanying number set and the contact number set to obtain a family and a family number set;
extracting a classmate number set of the student from a workplace number set corresponding to the number;
for example: for each student i ∈ L school,1 Extracting the classmate number set
Figure BDA0002237981260000401
Obtaining a friend number set based on the classmate number set and a contact number set corresponding to the number;
for example: for each student i ∈ L school,1 Extracting contact number set by using call relation
Figure BDA0002237981260000402
Taking the intersection between the college number set and the contact number set to obtain the friend number set F = { N { T = } i ,Com i }。
In a preferred aspect of this embodiment, the activity matching model is specifically configured to: acquiring a mobile phone data stream corresponding to a number in a campus roaming number set;
specifically, the handset data stream r i,t,grid Obtaining, from a base station associated with a campus, comprising: the number i belongs to L SchRoam Recording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number set RTAccu Comprises a real-time mobile phone data stream r corresponding to each number i,t,grid ,r i,t,grid ∈R RTAccuSchRoam
Acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
specifically, position data of a user is obtained according to the obtained mobile phone data stream, and identification is carried out according to the position data to obtain a corresponding identification result;
and after the identification result of one number is obtained, executing the same identification operation on the next number to obtain a corresponding identification result until the identification result of each number in the campus roaming number set is obtained, and obtaining a student number list.
In a further preferred solution of this embodiment, the position data corresponding to the user is obtained based on the mobile phone data stream, and the specific implementation process of obtaining the corresponding identification result by performing identification according to the position data is as follows:
acquiring position data of a corresponding number appearing in a bedtime period;
specifically, the location data includes location information of the number acquired by the corresponding base station, and since the user may change different locations in a day, there may be a plurality of grids corresponding to the locations, where the location data includes a plurality of corresponding grids (i.e., a plurality of grids in which the number appears), and the grids include grid numbers, grid center longitude and latitude, and other information.
Matching the corresponding user based on the position data appearing in the bedtime period to obtain a first matching result;
specifically, the corresponding user is matched based on the position data appearing in the sleeping time period to obtain a first matching result, the grid where the number appears in the position in the sleeping time period is analyzed, the grid is matched with the grids of each dormitory in the campus to obtain a first matching result, and the result is the dormitory location of the number to the user.
Matching the corresponding department based on the acquired position data to obtain a second matching result;
specifically, the obtained grids are respectively matched with the grids where the hospital department of each activity list is located, and a corresponding second matching result is obtained. The second matching result is the hospital system of the user corresponding to the number; for example, the grid where the number appears is matched with the corresponding courtyard outline polygon to obtain a corresponding second matching result.
Obtaining an identification result based on the first matching result and the second matching result;
specifically, the identification result of the number is obtained by combining the first matching result and the second matching result, for example, the dormitory where the user corresponding to the number is located is obtained according to the first matching result, the institution where the user is located is obtained by combining the second matching result, whether the user is a student is determined, and the corresponding result is output.
Acquiring a companion number set of students based on the campus roaming number set;
specifically, after the number is matched with an institution and a dormitory, the user corresponding to the number is identified as a student, and then a matching number set corresponding to the student is obtained based on the campus roaming number set;
further, each number i e L is sorted SchRoam Its associated value Dorm i And Dept i If none of the values is 0, if i belongs to L School,2 Other numbers in the campus roaming number set form a cosy number set
Figure BDA0002237981260000411
At the same time, for each number i ∈ L SchRoam Corresponding Dorm i And Dept i And dormitory frequency>
Figure BDA0002237981260000412
The hospital is frequently on or off>
Figure BDA0002237981260000413
Two values are output as dormitory and institution labels of number i, i belongs to L School,2 Is corresponding to a number of>
Figure BDA0002237981260000414
And & ->
Figure BDA0002237981260000415
Values are all 0.
In a further preferred embodiment of this embodiment, the specific implementation process of obtaining the first matching result based on matching the corresponding user with the location data appearing in the bedtime period is as follows:
acquiring the occurrence frequency of each grid in a sleeping time period;
specifically, grids appearing in a sleeping time period and corresponding appearing times are obtained according to the mobile phone data stream; this bedtime period is the bedtime that school set up, for example: 10 o 'clock at night to 7 o' clock in the morning;
selecting a preset number of grids from a plurality of grids appearing in a bedtime period, wherein the number of the appearance times of any selected grid in the bedtime period is more than the number of the appearance times of any unselected grid in the bedtime period;
specifically, because the occurrence times of each grid are inconsistent, the occurrence times of each grid are sorted, a preset number of grids with a larger occurrence time are selected, the preset number can be set according to the actual situation, and the preset number is not limited herein, such as 3 or 5, for example, 10 appeared grids are provided, the occurrence times of each grid may be inconsistent, the grids are sorted according to the occurrence times from high to low, and the grids ranked in the top five are selected; (ii) a
Respectively matching the grating of each dormitory corresponding to the campus with the selected grating to obtain a first matching result;
specifically, the selected grating is matched with the profile polygon due to the corresponding profile polygon at the position of each dormitory, whether the longitude and latitude of the center of the grating is surrounded by the profile polygon or not is judged, and if so, the grating is matched with the dormitory.
For ease of understanding, the identification process is described in detail below:
because the number appearing grids (more than one, and the appearing times are inconsistent) in the sleeping time period are obtained, the mobile phone signal flow record r corresponding to each number is recorded i,t,grid ∈R RTAccuSchRoam Calculating the sleeping time period
Figure BDA0002237981260000421
The number of times each grid appears in the grid is equal to L for each number i ∈ SchRoam Obtaining a corresponding bedding grid activity vector:
Figure BDA0002237981260000422
the active vector is used as the number i ∈ L SchRoam Grid occurring at bedtime u (u represents the grid number) for each number i ∈ L SchRoam Taken every day>
Figure BDA0002237981260000423
Five highest-valued grid numbers G i,s (1,...,5). Now the profile polygon per dormitory is used->
Figure BDA0002237981260000431
For each G i,s Grid center longitude and latitude G i,s (gridlon i,s ,gridlat i,s ) Carrying out containment relation calculation (whether the central longitude and latitude is in the outline polygon or not), if the central longitude and latitude is surrounded by the outline polygon, indicating matching, and adding 1 to the dormitory matching number sequence of each successful matching, namely Dorm i,s =Dorm i,s +1, and->
Figure BDA0002237981260000432
Dorm daily i,s The dormitory k with the highest corresponding value is the best judgment Dorm of the dormitory where the user corresponding to the number i is located in the same day i The Dorm, since a person may be ambulatory, e.g. students going to various dormitories i The corresponding values will vary, and will repeat over a period of time (e.g., one or two weeks from study), but will settle within one or two weeks, at which time the Dorm i As an output value, adding a label to the number i, wherein the label carries a dormitory number and a corresponding frequency, and the frequency value is ^ greater than or equal to ^ greater than>
Figure BDA0002237981260000433
s is Dorm i The corresponding dormitory number.
In a further preferred embodiment of this embodiment, the specific implementation process of obtaining the second matching result based on the obtained location data matching the corresponding department is as follows:
acquiring raster data of corresponding numbers appearing in each courtyard activity time period based on the position data;
specifically, grid data appearing in each department activity time period is obtained based on position data, wherein the grid data comprises grids and corresponding appearance times;
matching the grid with the largest occurrence frequency with the position of the corresponding hospital system to obtain a second matching result;
specifically, as more than one grid appears in the activity time and the appearance times are inconsistent, the appearing grids are sequenced according to the appearance times to obtain the grid with the most appearance times, so that the grids with the most appearance times corresponding to each hospital department activity are obtained, and then the grids with the most appearance times are matched with the positions of the corresponding hospital departments to obtain the hospital department of the number where the user is located.
For ease of understanding, the identification process is described in detail below:
obtaining a campus activity plan comprising a plurality of court-family activity lists Act (ActName) h ,t h,1 ,t h,2 ,Dept h ) At each activityTime period (t) h,1 ,t h,2 ) In, for each number i ∈ L SchRoam Calculating each grid griId it has appeared during the period and the number of appearances, activating for each number i the grid G that is most active during activity h i,h Courtyard contour polygon using activity h correspondences
Figure BDA0002237981260000441
For G i,h The grid center longitude and latitude->
Figure BDA0002237981260000442
Calculating the inclusion relation, and judging G i,h Whether surrounded by a courtyard outline polygon) and can also be calculated by G i,h Distance G from center point of polygon of hospital system contour i,h Judging whether the outline polygon surrounds the boundary by the difference value between the maximum distances between the edges of the outline polygon of the courtyard system (the difference value is larger than 0, which indicates that the outline polygon is outside, or else indicates that the outline polygon is not outside), if the outline polygon surrounds the boundary, the matching is successful, and adding 1 to the matching sequence of the courtyard system in each successfully matched activity, namely adding 1 to the matching sequence of the courtyard system
Figure BDA0002237981260000443
And is->
Figure BDA0002237981260000444
Based on the weight of the patient on a daily basis>
Figure BDA0002237981260000445
Highest value department of institution h I.e. the best judgment Dept of the family of the user corresponding to the number i i The value Dept i Repeats at the beginning of a period of time (e.g., one week), but stabilizes after a week because the student's activities stabilize after a period of time, tag is added to the number i, the tag carries a hospital number, a frequency, etc., the frequency being a value
Figure BDA0002237981260000446
dept h Is Dept i The yard is the number.
In a preferred aspect of this embodiment, the behavior learning model is specifically configured to:
acquiring a mobile phone data stream set from a base station associated with a campus;
specifically, a base station associated with the campus is used for acquiring a real-time mobile phone data flow set R RT The mobile phone data stream comprises a plurality of number data stream records, and each data stream record comprises corresponding behavior data;
matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
specifically, the mobile phone data stream set L that does not belong to the campus roaming number set is first selected SchRoam The records corresponding to other numbers except the number in the list are removed, and the mobile phone data stream set is matched with the campus roaming number set, for example, the campus roaming number set L is removed SchRoam Each number in the list is compared with a mobile phone data stream set to obtain a campus roaming number set L SchRoam The mobile phone data stream data corresponding to each number in the mobile phone data stream subset R is obtained RTSchRoam
Acquiring behavior data corresponding to a number from a mobile phone data stream subset;
in particular, from R RTSchRoam Obtaining each number i belongs to L SchRoam Data stream of mobile phone
Figure BDA0002237981260000451
Performing behavior data coding on each number i based on mobile phone data flow data, and extracting mobile phone browsing record r i,t,loc,url The mobile phone browsing record comprises: the number i belongs to L SchRoam Recording time t (occurrence time), recording base station bsid, base station longitude and latitude (lon/lat), accessing page data (pageTypeId), using APP data (APP-appId), call/short message identification actType (call, SMS, other), call/short message opposite terminal number comNum, and incoming/outgoing call identification direction (in/out).
Substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
specifically, a two-dimensional matrix is established in advance, and the establishing process of the two-dimensional matrix is as follows: arranging all access page data (APPid) and using APP data (pageTypeId) according to an axis sequence (the arrangement is related to the encoding modes of the pageTypeId and the APPid), arranging the recording base station bsid according to another axis, and constructing a two-dimensional matrix of the using behavior of the user with the number i, wherein the expression is as follows:
Figure BDA0002237981260000452
substituting the behavior data into the two-dimensional matrix to form an L belonging to each number i SchRoam Constructing a corresponding two-dimensional array:
Figure BDA0002237981260000453
wherein a row (1 … c) indicates c sorted appids and pageTypeId, and a column (1 … m) indicates m sorted recording base stations bsid, h>
Figure BDA0002237981260000461
The number of times the user of number i uses alpha applications or browses alpha web page content categories under the beta base station is indicated.
Normalizing the behavior matrix to obtain a processed behavior matrix;
in particular, act is a matrix for each behavior i Performing column normalization to obtain a processed behavior matrix
Figure BDA0002237981260000462
Calculating corresponding behavior correlation values based on the processed behavior matrix and the standard student behavior matrix;
specifically, during the start of a school, each school will develop a campus marketing campaign that confirms the true identity of a portion of the numbers from which the student list L is derived student And a parental list L family Inputting the two lists into a two-dimensional matrix for training a student identification model, and matching the campus roaming number set with the student list and the parent list to obtain an uncertain identity number set of the campus
Figure BDA0002237981260000463
Based on the student list i belongs to L student And the current processed behavior matrix
Figure BDA0002237981260000464
Establishing a student behavior model: />
Figure BDA0002237981260000465
The behavior matrix model is subjected to column normalization processing to obtain a standard student behavior model matrix->
Figure BDA0002237981260000466
Then, an uncertain (to be identified) identity number i epsilon L is calculated undefined Calculating the behavior correlation value with standard student, and processing the behavior matrix according to the number
Figure BDA0002237981260000471
Processed behavior matrix based on the standard student->
Figure BDA0002237981260000472
Calculating a behavior correlation value as follows:
Figure BDA0002237981260000473
comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
in particular, for each student number i' e L for which identity has been confirmed student And pre-calculating to obtain each number i' epsilon L student Corresponding toBehavior related value
Figure BDA0002237981260000474
Calculating a node value (cutoff), for all student numbers i' e L with confirmed identities student Obtaining each number i' epsilon L student Corresponding behavior-related value->
Figure BDA0002237981260000475
Get
Figure BDA0002237981260000476
With r cutoff As a standard for confirming the numbers of students to be confirmed, comparing the behavior related value corresponding to the numbers to be identified with the confirmation standard, identifying whether the user corresponding to the number is a student according to the comparison result, obtaining the identification result corresponding to each number, and finally obtaining a student list and an accompanying person list, wherein the student list specifically comprises: />
Figure BDA0002237981260000477
The accompanying person list is
Figure BDA0002237981260000478
Then, each recognized student number is tagged with a behavior label, i.e. ->
Figure BDA0002237981260000479
Further, the merging module 153 is specifically configured to: inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list;
still further, the merged model is a two-layer neural network model, a first layer neural network and a second layer neural network, the first layer neural network includes three neurons, the second layer neural network includes two neurons, the first layer neural network receives the three recognition results, specifically:
Figure BDA0002237981260000481
the two neurons of the second layer comprise two weighted matrices, respectively: />
Figure BDA0002237981260000482
The merged model comprises the following structure:
Figure BDA0002237981260000483
Figure BDA0002237981260000484
Figure BDA0002237981260000485
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002237981260000486
the cost function (cost function) of the activation function for any neuron is:
Figure BDA0002237981260000487
for the identity of the input function (i.e. the user of number i) confirmed in the marketing campaign,
Figure BDA0002237981260000491
respectively, the offset vector is a vector of the offset,
Figure BDA0002237981260000492
three recognition results are input into three neurons of the first layer of neural network, and the neurons are combined into a combined pattern>
Figure BDA0002237981260000493
The result output by the first layer of neural network is input into the second layer of neural network for learning training and output, and the result isStudent identity or companion parent identity, if the marketing campaign confirmation is a student, the is _ student component of the vector =1, if the confirmation is a companion parent, the is _ family component of the vector =1, if there is no marketing campaign confirmation information, both are 0 (the sample is discarded); if both marketing activity feedbacks are confirmed, both are 1, (this sample also needs to be aborted), if the outcome is &' s>
Figure BDA0002237981260000494
The cutoff for both component decisions corresponds to a value of 0.5.
In this embodiment, the training data of the merged model is derived from the confirmed student list and the confirmed parent list fed back by the marketing campaign, a backward propagation method is used for training the model, the student list and the parent list are updated every day during the beginning of each year, the merged model is retrained every day by using the data of the student list and the parent list to obtain an updated merged model, and if more basic data are trained and learned in the model, the recognition scientificity of the model can be improved.
The merging process is as follows:
for each unidentified number
Figure BDA0002237981260000495
The corresponding three recognition results are led into the merging model for training and learning, the recognition is recalculated, and the output result is greater than or equal to the preset value>
Figure BDA0002237981260000501
Forming a list of inferred students on the basis of the output result>
Figure BDA0002237981260000502
And guess parental lists
Figure BDA0002237981260000503
However, in the above two kinds of estimation lists, there may be one number, that is, there are an estimation student list and an estimation homeIn a long list.
After obtaining the guess list, it is necessary to confirm the tag of each number in the list, for example, it is necessary to compare the three recognition results with the result obtained by inputting the three recognition results into the merging model, and selectively output the tags of the three recognition results to the final user tag, which includes the following specific processes:
confirmation of the companion tag:
Figure BDA0002237981260000504
retaining its companion tag; />
Figure BDA0002237981260000505
&i∈L School,1 Then the companion tag is temporarily retained, i.e. the companion tag is not retained in the list, but stored for later use;
the identification of the dormitory and department tags,
Figure BDA0002237981260000506
&i∈L School,2 keeping labels of courtyards and dormitories;
Figure BDA0002237981260000507
&i∈L School,2 temporarily not using any of its tags, i.e. not keeping the companion tag in the list, but storing it for later use;
after the student list and the parent list confirmed after the marketing activity feedback are updated, the corresponding labels need to be confirmed again for the numbers I newly added in all the lists;
as with the validation of the tag:
Figure BDA0002237981260000508
retaining its companion tag;
Figure BDA0002237981260000509
discarding the corresponding tag;
and (3) confirmation of dormitory and institution labels:
Figure BDA00022379812600005010
the labels of dormitories and hospitals are reserved,
Figure BDA0002237981260000511
discarding the corresponding tag;
in this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In another variation of the present invention, the number may be identified by using only the long-term identification model, the activity matching model, or the behavior learning model to obtain the corresponding student list, and the identification process is consistent with the identification process of each model in the identification module 163 shown in fig. 16, which may specifically refer to the above description and is not repeated here.
The embodiment of the invention provides a nonvolatile readable computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the student identification method in any method embodiment.
Embodiments of the present invention provide a computer program product comprising a computer program stored on a computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform a student identification method in any of the above-mentioned method embodiments.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
Fig. 17 is a schematic structural diagram of an embodiment of the apparatus according to the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the apparatus.
As shown in fig. 17, the apparatus may include: a processor (processor) 1702, a Communications Interface 1704, a memory 1706, and a communication bus 1708.
Wherein: the processor 1702, communication interface 1704, and memory 1706 communicate with one another via a communication bus 1708. A communication interface 1704 for communicating with network elements of other devices, such as clients or other servers. Processor 1702, configured to execute program 1710, may specifically execute relevant steps in the above-described student identification method embodiment.
In particular, the program 1710 may include program code including computer operating instructions.
The processor 1702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
A memory 1706 for storing the program 1710. The memory 1706 may include a high-speed RAM memory and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 1710 may specifically be configured to cause the processor 1702 to perform the following:
acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value;
obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user;
and identifying students from the campus roaming number set based on the attribute data, and outputting an identification result, wherein the identification result at least comprises a student list.
In an optional manner, the attribute data includes: the program 1710 causes the processor 1702 to execute the following operations, where the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream, and the behavior data of the corresponding user are recorded:
identifying numbers of students from the campus roaming number set respectively based on activity records of the corresponding users in a preset time period, real-time mobile phone data streams and behavior data of the corresponding users to obtain corresponding student lists;
and merging the obtained student lists, and outputting student identification results.
In an alternative approach, the program 1710 causes the processor 1702 to:
identifying numbers of students from the campus roaming number set based on the activity records of the preset time period to obtain an identification result based on the activity records;
identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
and identifying numbers of students from the campus roaming number set based on the behavior data to obtain identification results based on the behavior data.
In an alternative approach, the program 1710 causes the processor 1702 to: and identifying the number of the student from the campus roaming number set based on the activity record of the preset time period, and outputting an identification result based on the activity record.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
acquiring base station data corresponding to each number based on the activity record set, wherein the base station data comprises a residential base station and a working place base station;
acquiring numbers appearing in the same base station based on base station data corresponding to each number to obtain a number set of the corresponding base station, wherein the number set comprises a target number set and an accompanying number set;
and acquiring a student number set corresponding to each school and a companion number set of each number based on the number sets.
In an alternative, the program 1710 causes the processor 1702 to:
respectively extracting the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set from the activity record set;
obtaining a residence base station corresponding to the number based on the activity record of the student vacation;
obtaining a work place base station corresponding to the number based on the activity record of the non-holiday of the student;
and repeating the steps until the residential area base station and the working area base station of each number in the campus roaming number set are obtained.
In an alternative approach, the program 1710 causes the processor 1702 to:
respectively acquiring base station data of the number appearing in the working day and night of the student holiday and in the public holiday period;
extracting a base station with the most days from the acquired base station data of the working day, the working day and the working night and the public holiday respectively to obtain target base station data corresponding to the working day, the working day and the working night and the public holiday;
comparing the obtained target base station data with corresponding preset threshold values respectively to obtain corresponding comparison results;
and obtaining the residential area base station corresponding to the number based on the comparison result.
In an alternative approach, the program 1710 causes the processor 1702 to:
forming a corresponding residential area number set for each residential station base based on the residential area base station corresponding to each number;
obtaining a residence accompanying number set corresponding to each number from the obtained residence number set;
forming a corresponding work place number set for each work place base station based on the work place base station corresponding to each number;
and obtaining a working accompanying number set corresponding to each number from the working number set.
In an alternative approach, the set of social relationships includes: a family number set, a classmate number set and a friend number set; the program 1710 causes the processor 1702 to obtain a set of social relationships based on the set of student numbers and the set of companion numbers, by:
acquiring a social relationship set based on the student number set and the accompanying number set, specifically: acquiring a family and punish number set of the student based on a residence accompanying number set corresponding to the number of the student and a contact number set corresponding to the number;
extracting a classmate number set of the student from a workplace number set corresponding to the number;
and obtaining a friend number set based on the college number set and the contact number set corresponding to the numbers.
In an alternative embodiment, where the attribute data is real-time cell phone data stream, the program 1710 causes the processor 1702 to: and identifying the numbers of the students from the campus roaming number set based on the real-time mobile phone data stream, and outputting an identification result based on the mobile phone data stream.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
and repeating the steps until an identification result corresponding to each number in the campus roaming number set is obtained, and obtaining a student number list.
In an alternative, the program 1710 causes the processor 1702 to:
acquiring position data of a corresponding number appearing in a bedtime period, wherein the position data comprises a plurality of corresponding grids;
matching corresponding users based on the position data appearing in the bedtime period to obtain a first matching result;
acquiring position data of the corresponding number in at least one courtyard activity time period;
matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
and obtaining an identification result based on the first matching result and the second matching result.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring the occurrence frequency of each grid in the sleeping time period;
selecting a preset number of grids from the plurality of grids appearing in the bedtime period, wherein the occurrence frequency of any selected grid in the bedtime period is more than the occurrence frequency of any grid which is not selected in the plurality of grids in the bedtime period;
and respectively matching the grating of each dormitory corresponding to the campus with the selected grating to obtain a first matching result.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring raster data of corresponding numbers appearing in each courtyard activity time period based on the position data, wherein the raster data comprises a raster and corresponding occurrence times;
and matching the grid with the most occurrence times with the position of the corresponding hospital system to obtain a second matching result.
In an alternative, where the attribute data is behavior data, the program 1710 causes the processor 1702 to: and identifying students from the campus roaming number set based on the behavior data, and outputting identification results.
In an alternative, the program 1710 causes the processor 1702 to:
acquiring a mobile phone data stream set from a base station associated with a campus, wherein the mobile phone data stream comprises a plurality of numbers and corresponding behavior data;
matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
acquiring behavior data corresponding to a number from the mobile phone data stream subset, wherein the behavior data comprises access page data, APP data, occurrence time and a base station which correspondingly appears;
substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
performing column normalization processing on the behavior matrix to obtain a processed behavior matrix;
calculating corresponding behavior correlation values based on the processed behavior matrix and a standard student behavior matrix;
comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
and acquiring behavior data corresponding to another number from the mobile phone data stream subset again, and repeating the steps until the numbers corresponding to the mobile phone data stream subset are all identified to obtain an identification result corresponding to the campus.
In an alternative approach, the program 1710 causes the processor 1702 to: inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list; the merging model comprises: the neural network comprises a first layer of neural network and a second layer of neural network, wherein the first layer of neural network comprises three neurons, and the second layer of neural network comprises two neurons.
The embodiment of the invention firstly obtains the campus roaming number set, identifies each number of the campus roaming number set based on the activity record of the user of the mobile phone number, the real-time mobile phone data stream, the user behavior data and the like, outputs the student number list, and can improve the accuracy of student number identification.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Moreover, those of skill in the art will appreciate that while some embodiments herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (5)

1. A student identification method, the method comprising:
acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value;
obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user;
acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
acquiring base station data corresponding to each number based on the activity record set, wherein the base station data comprises a residential area base station and a working area base station;
acquiring numbers appearing in the same base station based on base station data corresponding to each number to obtain a number set of the corresponding base station, wherein the number set comprises a target number set and an accompanying number set;
matching pre-acquired campus base station data with each workplace number set to form a student number set corresponding to the campus and an accompanying number set of each number;
acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
acquiring position data of a corresponding number appearing in a sleeping time period, wherein the position data comprises a plurality of corresponding grids;
matching corresponding users based on the position data appearing in the bedtime period to obtain a first matching result;
acquiring position data of the corresponding number in at least one courtyard activity time period;
matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
obtaining an identification result based on the first matching result and the second matching result;
repeating the steps until an identification result corresponding to each number in the campus roaming number set is obtained, and obtaining a student number list;
acquiring a mobile phone data stream set from a base station associated with a campus, wherein the mobile phone data stream comprises a plurality of numbers and corresponding behavior data;
matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
acquiring behavior data corresponding to a number from the mobile phone data stream subset, wherein the behavior data comprises access page data, occurrence time and a base station which correspondingly appears;
substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
performing column normalization processing on the behavior matrix to obtain a processed behavior matrix;
calculating corresponding behavior correlation values based on the processed behavior matrix and a standard student behavior matrix;
comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
and acquiring behavior data corresponding to another number from the mobile phone data stream subset again, and repeating the steps until the numbers corresponding to the mobile phone data stream subset are all identified to obtain an identification result corresponding to the campus.
2. The method of claim 1, wherein a student is identified from the set of campus roaming numbers based on the attribute data, further comprising:
inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list; the merging model comprises: the neural network comprises a first layer of neural network and a second layer of neural network, wherein the first layer of neural network comprises three neurons, and the second layer of neural network comprises two neurons.
3. A student identification device, the device comprising:
the device comprises a set acquisition module, a processing module and a processing module, wherein the set acquisition module is used for acquiring a campus roaming number set, the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value;
a data obtaining module, configured to obtain attribute data of the roaming numbers, where each attribute data includes one or more of the following: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user;
an identification module to identify a student from the set of campus roaming numbers based on the attribute data;
the identification module is further used for acquiring activity records of each number in the campus roaming number set in a preset time period and summarizing the activity records into an activity record set;
the identification module is further configured to obtain base station data corresponding to each number based on the activity record set, where the base station data includes a residential base station and a work site base station;
the identification module is further used for acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of the corresponding base station, wherein the number set comprises a target number set and an accompanying number set;
the identification module is further used for matching pre-acquired campus base station data with each work place number set to form a student number set corresponding to the campus and an accompanying number set of each number;
the identification module is further used for acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
the identification module is further used for acquiring position data of the corresponding number in a sleeping time period, wherein the position data comprises a plurality of corresponding grids;
the identification module is further used for matching corresponding users based on the position data appearing in the bedtime period to obtain a first matching result;
the identification module is further used for acquiring position data of the corresponding number in at least one courtyard activity time period;
the identification module is further used for matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
the identification module is further used for obtaining an identification result based on the first matching result and the second matching result;
the identification module is further used for repeating the steps until an identification result corresponding to each number in the campus roaming number set is obtained, and a student number list is obtained;
the identification module is further configured to acquire a mobile phone data stream set from a base station associated with the campus, where the mobile phone data stream includes a plurality of numbers and corresponding behavior data;
the identification module is further configured to match the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
the identification module is further configured to obtain behavior data corresponding to a number from the mobile phone data stream subset, where the behavior data includes access page data, occurrence time, and a base station that appears correspondingly;
the identification module is further used for substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
the identification module is further configured to perform column normalization processing on the behavior matrix to obtain a processed behavior matrix;
the identification module is further used for calculating corresponding behavior correlation values based on the processed behavior matrix and a standard student behavior matrix;
the identification module is further used for comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
the identification module is further configured to obtain behavior data corresponding to another number from the mobile phone data stream subset again, and repeat the above steps until all numbers corresponding to the mobile phone data stream subset are identified, so as to obtain an identification result corresponding to the campus.
4. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the student identification method according to any one of claims 1-2.
5. A computer-readable storage medium having stored therein at least one executable instruction for causing a processor to perform the steps of the student identification method according to any one of claims 1-2.
CN201910990107.8A 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium Active CN112685654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910990107.8A CN112685654B (en) 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910990107.8A CN112685654B (en) 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium

Publications (2)

Publication Number Publication Date
CN112685654A CN112685654A (en) 2021-04-20
CN112685654B true CN112685654B (en) 2023-04-07

Family

ID=75444648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910990107.8A Active CN112685654B (en) 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium

Country Status (1)

Country Link
CN (1) CN112685654B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979958A (en) * 2022-06-08 2022-08-30 中国联合网络通信集团有限公司 Juvenile user identification method, juvenile user identification platform, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047697A (en) * 2008-05-27 2011-05-04 高通股份有限公司 Methods and apparatus for generating user profile based on periodic location fixes
CN106658394A (en) * 2015-11-04 2017-05-10 ***通信集团公司 High-speed rail user separation method and apparatus thereof
CN107155214A (en) * 2016-03-02 2017-09-12 ***通信集团河北有限公司 A kind of number determines method and apparatus
CN108537909A (en) * 2018-03-23 2018-09-14 广州米度信息科技有限公司 A kind of the personnel's detection method and big data analysis system of unaware
CN109949063A (en) * 2017-12-20 2019-06-28 中移(苏州)软件技术有限公司 A kind of address determines method, apparatus, electronic equipment and readable storage medium storing program for executing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013055675A1 (en) * 2011-10-10 2013-04-18 Hummel Brett Patrick System & method for tracking members of an affinity group

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047697A (en) * 2008-05-27 2011-05-04 高通股份有限公司 Methods and apparatus for generating user profile based on periodic location fixes
CN106658394A (en) * 2015-11-04 2017-05-10 ***通信集团公司 High-speed rail user separation method and apparatus thereof
CN107155214A (en) * 2016-03-02 2017-09-12 ***通信集团河北有限公司 A kind of number determines method and apparatus
CN109949063A (en) * 2017-12-20 2019-06-28 中移(苏州)软件技术有限公司 A kind of address determines method, apparatus, electronic equipment and readable storage medium storing program for executing
CN108537909A (en) * 2018-03-23 2018-09-14 广州米度信息科技有限公司 A kind of the personnel's detection method and big data analysis system of unaware

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"河北移动经营分析***中高校市场综合分析子***的分析与设计";张琳;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110915(第09期);第I138-405页 *

Also Published As

Publication number Publication date
CN112685654A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
Hasan et al. Urban activity pattern classification using topic models from online geo-location data
CN105532030B (en) For analyzing the devices, systems, and methods of the movement of target entity
JP5248915B2 (en) GPS tracking and learning of user behavior preferences from well-known nearby destinations
CN106796550A (en) Information coordinating and delivering device and method
Roy et al. Modeling the dynamics of hurricane evacuation decisions from twitter data: An input output hidden markov modeling approach
CN108062366B (en) Public culture information recommendation system
CN110472057B (en) Topic label generation method and device
US20170206454A1 (en) Method and system for providing type information and evaluation information, using data collected from user terminal
Park et al. Application of graph theory to mining the similarity of travel trajectories
Belasco et al. Using a finite mixture model of heterogeneous households to delineate housing submarkets
Cho et al. Classifying tourists’ photos and exploring tourism destination image using a deep learning model
Dadashpour Moghaddam et al. A GIS-based assessment of urban tourism potential with a branding approach utilizing hybrid modeling
CN112685654B (en) Student identification method and device, computing equipment and readable computer storage medium
Mia et al. Registration status prediction of students using machine learning in the context of Private University of Bangladesh
CN113469752A (en) Content recommendation method and device, storage medium and electronic equipment
CN108133296B (en) Event attendance prediction method combining environmental data under social network based on events
Sun et al. Automatic building age prediction from street view images
CN112287243B (en) Service information recommendation device and method based on collaborative filtering algorithm
CN108647189A (en) A kind of method and device of identification user crowd&#39;s attribute
US11501100B1 (en) Computer processes for clustering properties into neighborhoods and generating neighborhood-specific models
CN114611622A (en) Method for identifying cross-city commuting crowd by utilizing mobile phone data
CN113743838A (en) Target user identification method and device, computer equipment and storage medium
Zhang An approach to localness assessment of social media users
Petkov EVALUATION OF SPATIAL DATA’S IMPACT IN MID-TERM ROOM RENT PRICE THROUGH APPLICATION OF SPATIAL ECONOMETRICS AND MACHINE LEARNING
Subramanian et al. Predictive Modeling and Mobility Pattern Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant