WO2020134299A1 - 室内外标记的区分方法、分类器的训练方法及设备和介质 - Google Patents

室内外标记的区分方法、分类器的训练方法及设备和介质 Download PDF

Info

Publication number
WO2020134299A1
WO2020134299A1 PCT/CN2019/109438 CN2019109438W WO2020134299A1 WO 2020134299 A1 WO2020134299 A1 WO 2020134299A1 CN 2019109438 W CN2019109438 W CN 2019109438W WO 2020134299 A1 WO2020134299 A1 WO 2020134299A1
Authority
WO
WIPO (PCT)
Prior art keywords
random forest
training
outdoor
indoor
classification model
Prior art date
Application number
PCT/CN2019/109438
Other languages
English (en)
French (fr)
Inventor
钟勇才
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020134299A1 publication Critical patent/WO2020134299A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • This article relates to the field of communications, and in particular to a method for distinguishing indoor and outdoor tags, a training method for classifiers, equipment and media.
  • LBS Location Based Service
  • Some of the mobile services occur indoors, and some mobile services occur outdoors. How to accurately determine whether a mobile service user is indoors or outdoors is very important for a specific room. For example, distinguishing between indoor and outdoor users can solve the problem of how operators can accurately identify deep coverage, and customize accurate station addition solutions accordingly.
  • the indoor coverage is insufficient, add a room substation; if the outdoor coverage is insufficient, add an outdoor station: for the elderly or children in need of care, you can judge whether they are in the room or area by indoor and outdoor; and within the company Access the network, once you leave the office building, you cannot access company information, etc.
  • indoor and outdoor differentiation of mobile services requires high real-time performance and high accuracy.
  • problems of low efficiency, high misjudgment rate, and real-time performance in determining the distinction between indoor and outdoor mobile users are problems of low efficiency, high misjudgment rate, and real-time performance in determining the distinction between indoor and outdoor mobile users.
  • the technical problem to be solved in this paper is to provide a method for distinguishing indoor and outdoor marks, a training method for classifiers, equipment and media, to at least solve the problem of high misjudgment rate in determining indoor and outdoor marks of users .
  • a method for distinguishing indoor and outdoor marks of users in the embodiments of this document includes: collecting measurement report data of a target user; and inputting the measurement report data of the target user into indoor and outdoor marks for classifying users Random forest classifier; determine indoor and outdoor marks of the target user according to the classification calculation of the random forest classifier.
  • a training method of a random forest classifier in the embodiment of this document includes: extracting training data from the collected measurement report data of the sample users in the target area and the actual indoor and outdoor tags corresponding to each piece of training data Set; input the training data set into a preset random forest classification model for training; during the training process, search the optimal model parameters of the random forest classification model through a grid; correspond to the optimal model parameters
  • the random forest classification model is used as the random forest classifier.
  • a communication node device in the embodiments herein includes a memory and a processor.
  • the memory stores a user's indoor and outdoor marking programs, and the processor executes the computer program to implement the above distinction method. A step of.
  • a random forest classifier training device in the embodiments herein includes a memory and a processor, the memory stores a random forest classifier training program, and the processor executes the computer program to Steps to achieve the above training method.
  • a computer-readable storage medium in the embodiments herein stores a user's indoor and outdoor labeling program, and the computer program may be executed by at least one processor to implement the steps of the above distinguishing method.
  • a computer-readable storage medium in the embodiments herein stores a training program of a random forest classifier, and the computer program may be executed by at least one processor to implement the steps of the training method above.
  • FIG. 1 is a flowchart of a method for distinguishing indoor and outdoor marks of users in an embodiment of this document;
  • FIG. 2 is a flowchart of a method for selectively distinguishing indoor and outdoor user marks in the embodiment of this document;
  • FIG. 3 is a prediction effect diagram of indoor and outdoor marks of a target user in the embodiment of this document.
  • the embodiments herein provide a method for distinguishing indoor and outdoor marks of users.
  • the method includes: S101, collecting measurement report data (MR, Measurement) of a target user; S102, measuring the target user The report data is input to a random forest classifier for classifying indoor and outdoor marks POSITIONMARK_REAL of users; S103, the indoor and outdoor marks of the target user are determined according to the classification calculation of the random forest classifier.
  • MR measurement report data
  • S102 measuring the target user
  • the report data is input to a random forest classifier for classifying indoor and outdoor marks POSITIONMARK_REAL of users
  • S103 the indoor and outdoor marks of the target user are determined according to the classification calculation of the random forest classifier.
  • the target user refers to the user to be located, and the user generally refers to the mobile user.
  • MR records the mobile user's serving cell ID (identification), RSRP (test power value), (LTE reference signal reception quality rsrq), TA_CALC (delay), AOA (incidence angle), STARTTIME (start time) , ENDTIME (end time), IMSI (International Mobile Subscriber Identity) and other wireless measurement information.
  • the MR data of the target user collected in the embodiment of this document includes AOA (angle of incidence), TA_CALC (time delay), RSRP (test power value), TADLTVALUE (downlink time delay), TIME_DIFFERENCE (time difference endtime-starttime).
  • Indoor and outdoor marks are used to mark whether a user is indoor or outdoor, and can also be described as indoor or outdoor marks or indoor and outdoor marks.
  • the method in the embodiment of this document can be applied to the communication node side, for example, the base station side; in the determination process, the base station can collect MR data of the target user in real time, so the MR data in the embodiment of this document can also be described as real-time MR data. Since the determination process is realized by the classification calculation of the random forest classifier, the determination process is also a prediction process.
  • the collected target user's MR data is input to a random forest classifier for classification calculation, so that the indoor and outdoor marks of the target user can be determined according to the classification calculation, and thus the false judgment rate can be effectively reduced in determining the indoor and outdoor marks of the user, and Judgment based on MR data effectively guarantees the real-time nature of the process of determining user indoor and outdoor marking.
  • the input of the measurement report data of the target user before the random forest classifier used to classify the indoor and outdoor labeling of the user includes: Collect the measurement report data of the sample users in the target area, and the indoor or outdoor tags corresponding to each measurement report data; extract the training from the collected measurement report data of the sample users in the target area and the actual indoor and outdoor tags corresponding to each training data Data set; input the training data set into a preset random forest classification model for training; during the training process, search the optimal model parameters of the random forest classification model through the GRIDSEARCHCV grid; enter the optimal model The random forest classification model corresponding to the parameter serves as the random forest classifier.
  • the target area may be a designated area, and the model parameters may include the number of decision trees N_ESTIMATORS and the calculated attribute CRITERION; the random forest classification model may be implemented through Python code, and the random forest classification model in the embodiment of this document may be simply referred to as a model.
  • the collected target can be The measurement report data of the sample users in the area and the indoor or outdoor tags corresponding to each measurement report data are used as the original data, and the original data is preprocessed to remove abnormal data.
  • the AOA incidence angle
  • TA_CALC time delay
  • RSRP test power value
  • TADLTVALUE downlink time delay
  • TIME_DIFFERENCE time difference endtime-starttime
  • other features in the training data set are extracted as independent variables X
  • the corresponding POSITIONMARK_REAL indoor and outdoor mark
  • the independent variable X is used to determine the indoor and outdoor mark Y; that is, each training data in the training data set is set as an independent variable, and each training data is set
  • the corresponding actual indoor and outdoor markers are set as dependent variables determined by the independent variables, which can be regarded as a 0-1 classification problem, which can effectively reduce the complexity of the random forest classifier training process and effectively improve the indoor and outdoor markers of users Prediction accuracy.
  • the random forest classification model obtained by the verification training can also be predicted through the test data set, and the prediction accuracy of the obtained indoor and outdoor marks of the user can be guaranteed by the prediction verification.
  • data preprocessing is performed on the original data to remove abnormal data, and feature values are extracted from the original data from which the abnormal data is removed to obtain a data set, and the data set is divided into a training data set and a test data set.
  • the test data set is continuously input into the trained random forest model for cross-prediction and verification until a relatively superior model is found as the final random forest classification model.
  • using the random forest classification model corresponding to the optimal model parameter as the random forest classifier may include: extracting a test data set from the sample measurement report data; and inputting the test data set Go to the random forest classification model corresponding to the optimal model parameters for prediction verification; determine the minimum mean square error between the prediction verification result and the actual indoor and outdoor markers corresponding to the test data set; where the mean square error is not greater than
  • the random forest classification model corresponding to the optimal model parameter is used as the random forest classifier; when the mean square error is greater than the threshold, the random forest classification model is searched again through a grid Optimal model parameters.
  • the embodiments herein provide a specific method for distinguishing indoor and outdoor user marks.
  • the method is mainly divided into two stages: an offline stage and an online stage.
  • the offline stage is mainly used for random forest classification Training
  • the online phase is mainly used for real-time prediction of the target, including:
  • Step 201 Collect MR data of the sample user in the target area.
  • MR data records the user's service cell ID, TA_CALC, RSRP, RSRQ, TA, AOA, MRTIME, STARTTIME, ENDTIME, IMSI and other wireless measurement information during the business process, and the POSTIONMARK_REAL indoor and outdoor marks corresponding to each measurement information.
  • Step 202 abnormal data processing.
  • Step 203 Select MR data corresponding to the feature value.
  • the characteristic values such as AOA (angle of incidence), TA_CALC (delay), RSRP (test power value), TADLTVALUE (downlink delay), TIME_DIFFERENCE (time difference endtime-starttime) are selected as independent variables X .
  • the corresponding POSITIONMARK_REAL (indoor and outdoor mark) is set as the dependent variable Y.
  • the random forest classification model has better accuracy and generalization.
  • Step 204 Train the model to optimize model parameters.
  • Step 205 A mechanism for measuring the accuracy of the model.
  • Crossover prediction verifies the minimum mean square error between the predicted value and the true value of the test set data.
  • the model is better if the error is smaller, otherwise it is worse.
  • the model with the highest accuracy rate is selected, and the model is saved; when the mean square error is not greater than a preset threshold, the corresponding A random forest classification model serves as the random forest classifier; when the mean square error is greater than the threshold, the grid is again searched for optimal model parameters of the random forest classification model.
  • Step 206 Collect real-time MR data of the target user.
  • AOA angle of incidence
  • TA_CALC time delay
  • RSRP test power value
  • TADLTVALUE downlink delay
  • TIME_DIFFERENCE Time difference (endtime-starttime) several indicators, and then use these indicators to actually predict the indoor and outdoor marks of mobile users.
  • Step 207 Real-time MR data preprocessing.
  • Step 208 Real-time MR data is input into a random forest classifier for prediction.
  • the processed real-time MR data is input into the previously trained random forest classifier, and then fitted by the random forest classifier.
  • Step 209 The indoor and outdoor marking results corresponding to the real-time MR data of these target users can be obtained.
  • the embodiments herein effectively improve the prediction accuracy of indoor and outdoor markings of users, and effectively ensure the real-time nature of the process of determining indoor and outdoor markings of users.
  • the embodiments herein provide a training method for a random forest classifier.
  • the method includes: extracting training data sets from the collected measurement report data of sample users in the target area and the actual indoor and outdoor tags corresponding to each piece of training data;
  • the training data set is input into a preset random forest classification model for training; during the training process, the grid is searched for the optimal model parameters of the random forest classification model; the random forest classification corresponding to the optimal model parameters is classified
  • the model serves as the random forest classifier.
  • the training process of the random forest classifier of the embodiment of this document is the same as the training process of the first embodiment.
  • the embodiments herein provide a communication node device, wherein the device includes a memory and a processor, the memory stores a user's indoor and outdoor marking programs, and the processor executes the computer program to implement the first embodiment and The steps of the method according to any one of the second embodiment.
  • the communication node device may be a base station or the like.
  • the embodiments herein provide a random forest classifier training device.
  • the device includes a memory and a processor.
  • the memory stores a random forest classifier training program.
  • the processor executes the computer program to implement the embodiment. Three steps of the method.
  • the embodiments herein provide a computer-readable storage medium, wherein the storage medium stores a user's indoor and outdoor marking programs, and the computer program may be executed by at least one processor to implement the first and second embodiments. Any one of the steps of the method.
  • the embodiments herein provide a computer-readable storage medium, wherein the storage medium stores a random forest classifier training program, and the computer program may be executed by at least one processor to implement the method described in Embodiment 3. step.
  • the collected target user's MR data is input to a random forest classifier for classification calculation, so that the indoor and outdoor marks of the target user can be determined according to the classification calculation, and then the user indoor In terms of external marking, the rate of misjudgment is effectively reduced, and the judgment is based on MR data, which effectively guarantees the real-time nature of determining the indoor and outdoor marking process of users.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种室内外标记的区分方法、分类器的训练方法及设备和介质,所述区分方法包括:采集目标用户的测量报告数据(S101);将所述目标用户的测量报告数据输入到用于分类用户的室内外标记的随机森林分类器(S102);根据所述随机森林分类器的分类计算,确定所述目标用户的室内外标记(S103)。

Description

室内外标记的区分方法、分类器的训练方法及设备和介质
本文要求享有2018年12月25日提交的名称为“室内外标记的区分方法、分类器的训练方法及设备和介质”的中国专利申请CN201811595402.5的优先权,其全部内容通过引用并入本文中。
技术领域
本文涉及通信领域,特别是涉及一种室内外标记的区分方法、分类器的训练方法及设备和介质。
背景技术
在移动互联网时代,人们的生活方式和行为习惯都被智能终端所改变。人们习惯性地通过基于位置服务(LBS,Location Based Service)寻找商场、医院、银行,甚至交友等等,其中部分移动业务是发生在室内,也有部分移动业务发生在室外。如何针对某个特定房间,能够准确判断出移动业务用户位于室内还是位于室外至关重要。例如:区分室内外用户可以解决运营商关注的如何精准识别深度覆盖问题,并依此定制精准加站方案。如果是室内覆盖不足,则添加室分站;如果是室外覆盖不足,则添加室外站:对于需要照顾的老人或小孩,可以通过室内外区分判断他们是否在房间或者区域内;以及在公司内部可以访问网络,一旦离开办公楼就无法访问公司信息等。
对上述应用的需求分析,移动业务室内外区分对实时性要求高,同时还对准确性要求高。但是,在一些情况下在判断移动用户室内外区分方面存在效率低下、误判率高和实时性得不到保证的问题。
发明内容
为了克服上述缺陷,本文要解决的技术问题是提供一种室内外标记的区分方法、分类器的训练方法及设备和介质,用以至少解决在确定用户室内外标记方面存在误判率高的问题。
为解决上述技术问题,本文实施例中的一种用户室内外标记的区分方法,包括:采集目标用户的测量报告数据;将所述目标用户的测量报告数据输入到用于分类用户的室 内外标记的随机森林分类器;根据所述随机森林分类器的分类计算,确定所述目标用户的室内外标记。
为解决上述技术问题,本文实施例中的一种随机森林分类器的训练方法,包括:从采集的目标区域内样本用户的测量报告数据和每条训练数据对应的实际室内外标记中提取训练数据集;将所述训练数据集输入到预设的随机森林分类模型中进行训练;在训练过程中,通过网格搜索所述随机森林分类模型的最优模型参数;将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器。
为解决上述技术问题,本文实施例中的一种通信节点设备,包括存储器和处理器,所述存储器存储有用户的室内外标记程序,所述处理器执行所述计算机程序,以实现如上区分方法的步骤。
为解决上述技术问题,本文实施例中的一种随机森林分类器的训练设备,包括存储器和处理器,所述存储器存储有随机森林分类器的训练程序,所述处理器执行所述计算机程序以实现如上训练方法的步骤。
为解决上述技术问题,本文实施例中的一种计算机可读存储介质,存储有用户的室内外标记程序,所述计算机程序可被至少一个处理器执行,以实现如上区分方法的步骤。
为解决上述技术问题,本文实施例中的一种计算机可读存储介质,存储有随机森林分类器的训练程序,所述计算机程序可被至少一个处理器执行,以实现如上训练方法的步骤。
本文上述说明仅是本文技术方案的概述,为了能够更清楚了解本文的技术手段,而可依照说明书的内容予以实施,并且为了让本文的上述和其它目的、特征和优点能够更明显易懂,以下特举本文的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本文的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1是本文实施例中一种用户室内外标记的区分方法的流程图;
图2是本文实施例中一种可选地用户室内外标记的区分方法的流程图;
图3是本文实施例中目标用户的室内外标记的预测效果图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本文的说明,其本身没有特定的意义。因此,“模块”、“部件”或“单元”可以混合地使用。
使用用于区分元件的诸如“第一”、“第二”等前缀仅为了有利于本文的说明,其本身没有特定的意义。
实施例一
本文实施例提供一种用户室内外标记的区分方法,如图1所示,所述方法包括:S101,采集目标用户的测量报告数据(MR,Measurement Report);S102,将所述目标用户的测量报告数据输入到用于分类用户的室内外标记POSITIONMARK_REAL的随机森林分类器;S103,根据所述随机森林分类器的分类计算,确定所述目标用户的室内外标记。
其中,目标用户指代待定位的用户,用户一般指代移动用户。MR记录了移动用户在业务过程中的服务小区ID(标识)、RSRP(测试功率值)、(LTE参考信号接收质量rsrq)、TA_CALC(时延)、AOA(入射角度)、STARTTIME(开始时间)、ENDTIME(结束时间)、IMSI(国际移动用户识别码)等无线测量信息。在本文实施例中采集的目标用户的MR数据包括AOA(入射角度)、TA_CALC(时延)、RSRP(测试功率值)、TADLTVALUE(下行时延)、TIME_DIFFERENCE(时间差endtime-starttime)。室内外标记用于标记用户处于室内或室外,也可以描述为室内或室外标记、室内室外标记。
本文实施例中方法可以应用于通信节点侧,例如基站侧;在确定过程中,基站可以实时采集的目标用户的MR数据,因此本文实施例中MR数据也可以描述为实时MR数据。由于确定过程是通过随机森林分类器的分类计算来实现的,因此确定过程也是一个预测过程。
本文实施例通过将采集的目标用户的MR数据输入到随机森林分类器进行分类计算, 从而可以根据分类计算确定目标用户的室内外标记,进而在确定用户室内外标记方面有效降低误判率,并且基于MR数据进行判断,有效保证确定用户室内外标记过程中的实时性。
在上述实施例的基础上,下面给出几个具体及可选实施方式,用以细化和优化本文实施例,以使本文实施例的方案的实施更方便,准确。需要说明的是,在不冲突的情况下,以下实施方式可以互相任意组合。
为了有效保证确定用户室内外标记过程中的实时性,在一些实施方式中,所述将所述目标用户的测量报告数据输入到用于分类用户的室内外标记的随机森林分类器之前,包括:采集目标区域内样本用户的测量报告数据,以及每条测量报告数据对应的室内或室外标签;从采集的目标区域内样本用户的测量报告数据和每条训练数据对应的实际室内外标记中提取训练数据集;将所述训练数据集输入到预设的随机森林分类模型中进行训练;在训练过程中,通过GRIDSEARCHCV网格搜索所述随机森林分类模型的最优模型参数;将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器。
其中,目标区域可以是一个指定区域,模型参数可以包括决策树个数N_ESTIMATORS和计算属性CRITERION;可以通过Python代码实现随机森林分类模型,本文实施例中随机森林分类模型可以简称为模型。当然在将所述训练数据集和每条训练数据对应的实际室内外标记输入到预设的随机森林分类模型中进行训练之前,为了提高用户的室内外标记的预测准确性,可以将采集的目标区域内样本用户的测量报告数据以及每条测量报告数据对应的室内或室外标签作为原始数据,对所述原始数据进行数据预处理,剔除异常数据。
在预测过程中,提取训练数据集中的AOA(入射角度)、TA_CALC(时延)、RSRP(测试功率值)、TADLTVALUE(下行时延)、TIME_DIFFERENCE(时间差endtime-starttime)等特征为自变量X,对应的POSITIONMARK_REAL(室内室外标记)设为因变量Y,用自变量X决定室内室外标记Y;也就是说,将所述训练数据集中每个训练数据设置为自变量,将所述每个训练数据对应的实际室内外标记设置为由所述自变量决定的因变量,可以看成一个0-1分类问题,从而可以有效降低随机森林分类器训练过程的复杂度,并有效提高用户的室内外标记的预测准确性。
在预测过程中,也可以通过测试数据集来预测验证训练得到的随机森林分类模型,通过预测验证来保证得到的用户的室内外标记的预测准确性。也就说,对所述原始数据进行数据预处理,剔除异常数据,并从剔除异常数据的原始数据中提取特征值得到数据 集,将数据集分成训练数据集和测试数据集两部分。不断将将测试数据集输入到训练后的随机森林模型中进行交差预测验证,直到找到相对较优的模型作为最终的随机森林分类模型。也就是说,所述将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器,可以包括:从所述样本测量报告数据中提取测试数据集;将所述测试数据集输入到所述最优模型参数对应的随机森林分类模型进行预测验证;确定预测验证结果与所述测试数据集对应设置的实际室内外标记之间的最小均方误差;在所述均方误差不大于预设的阈值时,将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器;在所述均方误差大于所述阈值时,重新通过网格搜索所述随机森林分类模型的最优模型参数。
实施例二
基于实施例一,本文实施例提供一种具体的用户室内外标记的区分方法,如图2所示,所述方法主要分成两个阶段:离线阶段和在线阶段,离线阶段主要用于随机森林分类器的训练,在线阶段主要用于对目标用于的实时预测,包括:
步骤201、采集目标区域内样本用户的MR数据。
选取一个指定区域,通过在基站侧采集用户上报的12000条MR数据。MR数据记录了用户在业务过程中的服务小区ID、TA_CALC、RSRP、RSRQ、TA、AOA、MRTIME、STARTTIME、ENDTIME、IMSI等无线测量信息,以及每条测量信息对应的POSTIONMARK_REAL室内外标记。
步骤202、异常数据处理。
将采取的12000条MR数据中的各个字段异常的数据或空值用0来代替,并对整个数据矩阵进行正交归一化处理。随机选取数据集中75%数据作为训练集,25%数据作为测试集分别保存到两个文件中。
步骤203、特征值对应的MR数据选取。
由于MR记录的指标项比较多,对整个模型的计算和准确性造成很多的影响。为了提高模型的计算和准确性,选取AOA(入射角度)、TA_CALC(时延)、RSRP(测试功率值)、TADLTVALUE(下行时延)、TIME_DIFFERENCE(时间差endtime-starttime)等特征值为自变量X,对应的POSITIONMARK_REAL(室内室外标记)设为因变量Y。这样将该问题转换为数学问题,用环境变量X决定室内室外标记Y,可以看成一个0-1分类问题,在本文实施例中随机森林分类模型具有更优的准确性和泛化性。通过Python代码构建随机森林分类模型,将训练数据集输入到RANDOMFORESTCLASSIFIER模型 中开始进行训练。
步骤204、训练模型优化模型参数。
将训练数据集输入到RANDOMFORESTCLASSIFIER模型中,再通过GRIDSEARCHCV网格搜索最优的随机森林分类算法的决策树个数N_ESTIMATORS和计算属性CRITERION;将测试数据集输入到训练后的模型中进行交差验证。如果误差越小则选择该模型,否则继续调整模型参数,直到模型验证测试数据的误差足够小。
步骤205、衡量模型准确率机制。
将测试集数据输入到训练好的随机森林分类模型进行交叉预测验证,
交差预测验证测试集数据的预测值和真实值之间的最小均方误差。如果该误差越小模型越好,反之则差。每次将模型的预测数据集的准确性记录起来,选择准确率最高的模型,将该模型保存起来;在所述均方误差不大于预设的阈值时,将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器;在所述均方误差大于所述阈值时,重新通过网格搜索所述随机森林分类模型的最优模型参数。
步骤206、采集目标用户的实时MR数据。
随机选择一个区域的目标用户,在基站侧采集部分目标用户的MR实时数据,至少包括AOA(入射角度)、TA_CALC(时延)、RSRP(测试功率值)、TADLTVALUE(下行时延)、TIME_DIFFERENCE(时间差endtime-starttime)几个指标,再用这些指标实际预测移动用户的室内外标记。
步骤207、实时MR数据预处理。
实时数据中可能存在异常或空值数据,将这些异常数据用0来替代,选取训练模型对应的几个指标作为特征值。对特征值数据进行正交归一化处理,可以有效避免过拟合现象的发生。
步骤208、实时MR数据输入随机森林分类器进行预测。
如图3所示,将处理后的实时MR数据输入之前训练好的随机森林分类器中,经过随机森林分类器的拟合。
步骤209、可得这些目标用户的实时MR数据对应的室内、室外标记结果。
本文实施例有效提高用户的室内外标记的预测准确性,并且有效保证确定用户室内外标记过程中的实时性。
实施例三
本文实施例提供一种随机森林分类器的训练方法,所述方法包括:从采集的目标区域内样本用户的测量报告数据和每条训练数据对应的实际室内外标记中提取训练数据集;将所述训练数据集输入到预设的随机森林分类模型中进行训练;在训练过程中,通过网格搜索所述随机森林分类模型的最优模型参数;将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器。
本文实施例随机森林分类器的训练过程与实施例一的训练过程相同,在具体实现时,可以参阅实施例一,具有相应的技术效果。
实施例四
本文实施例提供一种通信节点设备,其中,所述设备包括存储器和处理器,所述存储器存储有用户的室内外标记程序,所述处理器执行所述计算机程序,以实现如实施例一和实施例二中任意一项所述方法的步骤。其中通信节点设备可以是基站等。
实施例五
本文实施例提供一种随机森林分类器的训练设备,所述设备包括存储器和处理器,所述存储器存储有随机森林分类器的训练程序,所述处理器执行所述计算机程序以实现如实施例三所述方法的步骤。
实施例六
本文实施例提供一种计算机可读存储介质,其中,所述存储介质存储有用户的室内外标记程序,所述计算机程序可被至少一个处理器执行,以实现如实施例一和实施例二中任意一项所述方法的步骤。
实施例七
本文实施例提供一种计算机可读存储介质,其中,所述存储介质存储有随机森林分类器的训练程序,所述计算机程序可被至少一个处理器执行,以实现如实施例三所述方法的步骤。
需要说明的是,实施例三至实施例七的具体实现可以参阅实施例一,具有相应的技术效果。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素, 并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本文实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本文的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本文各个实施例所述的方法。
本文实施例有益效果如下:上述的各个实施例中通过将采集的目标用户的MR数据输入到随机森林分类器进行分类计算,从而可以根据分类计算确定目标用户的室内外标记,进而在确定用户室内外标记方面有效降低误判率,并且基于MR数据进行判断,有效保证确定用户室内外标记过程中的实时性。
上面结合附图对本文的实施例进行了描述,但是本文并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本文的启示下,在不脱离本文宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本文的保护之内。

Claims (10)

  1. 一种用户室内外标记的区分方法,其中,所述方法包括:
    采集目标用户的测量报告数据;
    将所述目标用户的测量报告数据输入到用于分类用户的室内外标记的随机森林分类器;
    根据所述随机森林分类器的分类计算,确定所述目标用户的室内外标记。
  2. 如权利要求1所述的方法,其中,所述将所述目标用户的测量报告数据输入到用于分类用户的室内外标记的随机森林分类器之前,包括:
    从采集的目标区域内样本用户的测量报告数据和每条训练数据对应的实际室内外标记中提取训练数据集;
    将所述训练数据集输入到预设的随机森林分类模型中进行训练;
    在训练过程中,通过网格搜索所述随机森林分类模型的最优模型参数;
    将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器。
  3. 如权利要求2所述的方法,其中,所述将所述训练数据集输入到预设的随机森林分类模型中进行训练之前,包括:
    将所述训练数据集中每个训练数据设置为自变量,将所述每个训练数据对应的实际室内外标记设置为由所述自变量决定的因变量。
  4. 如权利要求2所述的方法,其中,所述将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器,包括:
    从所述样本测量报告数据中提取测试数据集;
    将所述测试数据集输入到所述最优模型参数对应的随机森林分类模型进行预测验证;
    确定预测验证结果与所述测试数据集对应设置的实际室内外标记之间的最小均方误差;
    在所述均方误差不大于预设的阈值时,将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器;
    在所述均方误差大于所述阈值时,重新通过网格搜索所述随机森林分类模型的最优模型参数。
  5. 如权利要求1-4中任意一项所述的方法,其中,所述测量报告数据包括入射角度、时延、测试功率值、下行时延和时间差。
  6. 一种随机森林分类器的训练方法,其中,所述方法包括:
    从采集的目标区域内样本用户的测量报告数据和每条训练数据对应的实际室内外标记中提取训练数据集;
    将所述训练数据集输入到预设的随机森林分类模型中进行训练;
    在训练过程中,通过网格搜索所述随机森林分类模型的最优模型参数;
    将所述最优模型参数对应的随机森林分类模型作为所述随机森林分类器。
  7. 一种通信节点设备,其中,所述设备包括存储器和处理器,所述存储器存储有用户的室内外标记程序,所述处理器执行所述计算机程序,以实现如权利要求1-5中任意一项所述方法的步骤。
  8. 一种随机森林分类器的训练设备,其中,所述设备包括存储器和处理器,所述存储器存储有随机森林分类器的训练程序,所述处理器执行所述计算机程序以实现如权利要求6所述方法的步骤。
  9. 一种计算机可读存储介质,其中,所述存储介质存储有用户的室内外标记程序,所述计算机程序可被至少一个处理器执行,以实现如权利要求1-5中任意一项所述方法的步骤。
  10. 一种计算机可读存储介质,其中,所述存储介质存储有随机森林分类器的训练程序,所述计算机程序可被至少一个处理器执行,以实现如权利要求6所述方法的步骤。
PCT/CN2019/109438 2018-12-25 2019-09-30 室内外标记的区分方法、分类器的训练方法及设备和介质 WO2020134299A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811595402.5A CN111368862A (zh) 2018-12-25 2018-12-25 室内外标记的区分方法、分类器的训练方法及设备和介质
CN201811595402.5 2018-12-25

Publications (1)

Publication Number Publication Date
WO2020134299A1 true WO2020134299A1 (zh) 2020-07-02

Family

ID=71128575

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/109438 WO2020134299A1 (zh) 2018-12-25 2019-09-30 室内外标记的区分方法、分类器的训练方法及设备和介质

Country Status (2)

Country Link
CN (1) CN111368862A (zh)
WO (1) WO2020134299A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113398569A (zh) * 2021-06-15 2021-09-17 网易(杭州)网络有限公司 卡组分类的处理、模型的训练、卡组的搜索方法及设备
CN113993068A (zh) * 2021-10-18 2022-01-28 郑州大学 定位寻向***、方法和ble定位设备
CN117909886A (zh) * 2024-03-18 2024-04-19 南京海关工业产品检测中心 一种基于优化随机森林模型的锯齿棉品级分类方法及***

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181055A (zh) * 2020-09-28 2021-01-05 广东小天才科技有限公司 室内外状态判断方法及可穿戴设备、计算机可读存储介质
CN115082767A (zh) * 2021-03-15 2022-09-20 ***通信集团福建有限公司 随机森林模型训练方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695152A (zh) * 2009-10-12 2010-04-14 中国科学院计算技术研究所 室内定位的方法及其***
WO2010148769A1 (zh) * 2009-11-11 2010-12-29 中兴通讯股份有限公司 用户终端定位方法、装置以及用户终端导航方法、装置
CN104239034A (zh) * 2014-08-19 2014-12-24 北京奇虎科技有限公司 智能电子设备的场合识别方法和信息通知方法及其装置
CN105025440A (zh) * 2015-07-09 2015-11-04 深圳天珑无线科技有限公司 室内外场景检测方法及设备
CN108151743A (zh) * 2017-12-13 2018-06-12 联想(北京)有限公司 室内外位置识别方法及***

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108616900B (zh) * 2016-12-12 2021-06-11 ***通信有限公司研究院 一种室内外测量报告的区分方法及网络设备
CN109034177B (zh) * 2018-05-24 2022-07-29 东南大学 一种移动智能终端室内外识别方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695152A (zh) * 2009-10-12 2010-04-14 中国科学院计算技术研究所 室内定位的方法及其***
WO2010148769A1 (zh) * 2009-11-11 2010-12-29 中兴通讯股份有限公司 用户终端定位方法、装置以及用户终端导航方法、装置
CN104239034A (zh) * 2014-08-19 2014-12-24 北京奇虎科技有限公司 智能电子设备的场合识别方法和信息通知方法及其装置
CN105025440A (zh) * 2015-07-09 2015-11-04 深圳天珑无线科技有限公司 室内外场景检测方法及设备
CN108151743A (zh) * 2017-12-13 2018-06-12 联想(北京)有限公司 室内外位置识别方法及***

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113398569A (zh) * 2021-06-15 2021-09-17 网易(杭州)网络有限公司 卡组分类的处理、模型的训练、卡组的搜索方法及设备
CN113398569B (zh) * 2021-06-15 2024-02-02 网易(杭州)网络有限公司 卡组分类的处理、模型的训练、卡组的搜索方法及设备
CN113993068A (zh) * 2021-10-18 2022-01-28 郑州大学 定位寻向***、方法和ble定位设备
CN113993068B (zh) * 2021-10-18 2024-01-30 郑州大学 定位寻向***、方法和ble定位设备
CN117909886A (zh) * 2024-03-18 2024-04-19 南京海关工业产品检测中心 一种基于优化随机森林模型的锯齿棉品级分类方法及***
CN117909886B (zh) * 2024-03-18 2024-05-24 南京海关工业产品检测中心 一种基于优化随机森林模型的锯齿棉品级分类方法及***

Also Published As

Publication number Publication date
CN111368862A (zh) 2020-07-03

Similar Documents

Publication Publication Date Title
WO2020134299A1 (zh) 室内外标记的区分方法、分类器的训练方法及设备和介质
US10405052B2 (en) Method and apparatus for identifying television channel information
AU2018264440B2 (en) Identity authentication method, device and system
CN109635117B (zh) 一种基于知识图谱识别用户意图方法及装置
WO2020248957A1 (zh) 遮挡的目标物体的检测方法、***、设备及存储介质
WO2021189730A1 (zh) 检测异常高密子图的方法、装置、设备及存储介质
CN104185275B (zh) 一种基于wlan的室内定位方法
CN110362677B (zh) 文本数据类别的识别方法及装置、存储介质、计算机设备
CN109309630A (zh) 一种网络流量分类方法、***及电子设备
CN106507475B (zh) 基于EKNN的室内区域WiFi定位方法及***
CN103117903A (zh) 上网流量异常检测方法及装置
CN111328102A (zh) 一种共覆盖关系识别的方法及装置
CN110288468B (zh) 数据特征挖掘方法、装置、电子设备及存储介质
CN105408894A (zh) 一种用户身份类别确定方法以及装置
CN111626767A (zh) 资源数据的发放方法、装置及设备
CN113221721A (zh) 图像识别方法、装置、设备及介质
CN116563841B (zh) 配电网设备标识牌的检测方法、检测装置和电子设备
US20190295065A1 (en) Affiliated store labeling method, affiliated store labeling device, and affiliated store labeling system for wireless lan fingerprint
CN117196630A (zh) 交易风险预测方法、装置、终端设备以及存储介质
Fan et al. WiFi based indoor localization with multiple kernel learning
JP7105176B2 (ja) 機械学習装置、判定装置、プログラム及び機械学習方法
CN110519685A (zh) 基于WiFi的室内定位方法、装置及介质
CN116263906A (zh) 驿站选址的确定方法、装置及存储介质
CN105160268A (zh) 数据跟踪及监控***、智能路由器及其数据跟踪监控方法
CN113657440A (zh) 一种基于用户特征聚类的拒绝样本推断方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19906016

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 19/11/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19906016

Country of ref document: EP

Kind code of ref document: A1