WO2021052031A1 - 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质 - Google Patents

基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质 Download PDF

Info

Publication number
WO2021052031A1
WO2021052031A1 PCT/CN2020/105964 CN2020105964W WO2021052031A1 WO 2021052031 A1 WO2021052031 A1 WO 2021052031A1 CN 2020105964 W CN2020105964 W CN 2020105964W WO 2021052031 A1 WO2021052031 A1 WO 2021052031A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
inventory
early warning
abnormal
interquartile range
Prior art date
Application number
PCT/CN2020/105964
Other languages
English (en)
French (fr)
Inventor
欧文祥
徐亮
蒋旭曦
Original Assignee
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁云计算有限公司 filed Critical 苏宁云计算有限公司
Priority to CA3155689A priority Critical patent/CA3155689A1/en
Publication of WO2021052031A1 publication Critical patent/WO2021052031A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/087Inventory or stock management, e.g. order filling, procurement or balancing against orders

Definitions

  • the invention belongs to the application of big data in the field of retail risk control, and specifically relates to a method, system and computer-readable storage medium for early warning of commodity inventory risks based on statistical interquartile range.
  • the research on data outlier detection methods is currently mainly focused on unsupervised anomaly detection.
  • Commonly used detection methods include statistical and probability model methods, linear model-based methods, and similarity-based measurement models.
  • the methods based on statistics mainly include the 3 ⁇ principle and the method based on box plot analysis.
  • the methods based on the linear model mainly include PCA (principal component) analysis and One-class SVM (support vector machine), etc., based on the similarity measurement model
  • the main methods include k-nearest neighbor and Isolation Forest (isolated forest). Due to the wide variety of commodities, the amount of data is very large, and the commodity inventory data belongs to a one-dimensional time series, the calculation cost based on the linear model and the similarity measurement model is relatively large.
  • the present invention will adopt a statistical method .
  • the 3 ⁇ principle only applies to data that obey a normal distribution.
  • an outlier is defined as the deviation between the observed value and the average value by more than 3 times the standard deviation, P(
  • the probability of occurrence of a value greater than 3 ⁇ is less than 0.003, which is a small probability event, so it can be regarded as an outlier.
  • Inventory data belongs to time series. At present, many detection methods do not consider the time series characteristics of time series, but consider from the complete set of data. Local outliers are easy to be missed. In addition, inventory data has some characteristics of its own. For certain categories of goods , May remain unchanged for a long duration, that is, there is a lot of duplicate data.
  • the purpose of the present invention is to provide a method and system for early warning of commodity inventory risk based on statistical interquartile range, to overcome the high calculation cost and the large amount of data in the prior art. Problems such as low timeliness.
  • a method for early warning of commodity inventory risk based on statistical interquartile range comprising:
  • the number of bits, MAX is the threshold.
  • calculating the inventory increment data according to the original commodity inventory data includes the following steps:
  • the original product inventory data is first grouped by store and product, and sorted by time, and the missing data is filled with zero values to get the preliminarily organized historical data;
  • the calculation process of the interquartile range includes:
  • the method further includes adopting a sliding time window mode, and recalculating a new abnormality detection threshold at intervals of a period of time.
  • the latest inventory data is collected every day.
  • the data is used to determine the abnormality of the inventory data for a period of time in the future to improve the data judgment Timeliness.
  • the method further includes, after the front-end receives the abnormal data push, a business person manually reviews it to determine whether it is abnormal data. After being judged as abnormal data, manual detection can further improve the accuracy of judgment.
  • the spark data platform is used to process the grouping and sorting of the original product data and the difference operation. Using the spark platform can improve computing power and processing efficiency.
  • a risk identification system for suspected actual controllers based on knowledge graphs includes:
  • the data collection module is used to obtain the original product inventory data of all stores in a certain historical time period from the inventory database;
  • the data processing module performs processing operations on the original product inventory data to obtain the inventory incremental data
  • the threshold calculation module calculates the upper and lower quartiles of the inventory increment data, and calculates the interquartile range and abnormal detection threshold according to the upper and lower quartiles;
  • the early warning module detects whether the new inventory increment exceeds the abnormal detection threshold. If it exceeds, it will be judged as abnormal data and sent to the front-end early warning.
  • the data processing module includes:
  • the data grouping unit groups the original product inventory data
  • the data sorting unit sorts the original product inventory data according to time, and fills in missing data with zero values
  • the difference calculation unit performs a difference operation on the grouped and sorted data, takes the absolute value of the result, and removes all zero values to obtain the final inventory increment data.
  • the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is run by a processor, each step of the method in the present invention is executed.
  • the present invention uses the method of statistical interquartile range to calculate the threshold of abnormal inventory increment.
  • the calculation efficiency is high, and the risk is quickly and accurately positioned. Compared with the traditional manual audit and inventory, the workload is greatly reduced, and it can be avoided. Differences caused by human subjective factors.
  • the present invention actively warns the user or the front end when the monitoring exceeds the threshold, and can realize the T+1 early warning mode.
  • the abnormality detection and judgment of the inventory data every day greatly increases the risk of abnormal inventory. Timeliness of discovery.
  • the present invention uses the spark platform to process and calculate inventory incremental data, utilizes its computing power under a large amount of data and its advantages in iterative computing scenarios, and uses multiple threads for concurrent processing, which greatly improves data performance. Processing efficiency.
  • FIG. 1 is a schematic flowchart of a method for early warning of commodity inventory risk based on statistical interquartile range in an embodiment of the present invention.
  • Fig. 2 is a schematic diagram of setting interquartile range, interquartile range and threshold value in an embodiment of the present invention.
  • FIG. 3 is a statistical schematic diagram of the change in the inventory quantity of a certain commodity in the past year and the corresponding abnormality detection threshold in the embodiment of the invention.
  • FIG. 4 is a structural diagram of a commodity inventory risk early warning system based on statistical interquartile range in an embodiment of the present invention.
  • the embodiment of the present invention discloses a method for early warning of commodity inventory risk based on statistical interquartile range.
  • the method includes the following steps:
  • the product inventory data of all stores for a period of time before the current date is obtained from the product inventory database. For example, based on one year, the data in the first 12 months from this month is counted.
  • the product inventory The data in the database can be synchronously transmitted to the HDFS (distributed file storage) system of the HADOOP cluster at regular intervals, so that it can be directly obtained from the HDFS platform.
  • HDFS distributed file storage
  • this step includes:
  • the original product inventory data is first grouped by store and product, and sorted by time.
  • the missing data can be filled in with zero values on a day-to-day basis to get the preliminarily organized historical data;
  • the inventory increment data is the daily inventory increment during the historical period.
  • the sorting time can also be counted according to week and month, which is the weekly or monthly inventory increment data.
  • the system monitors the new inventory increment in real time.
  • the new inventory increment changes and exceeds the threshold, it actively reminds the front end and users to remind the financial staff to pay attention.
  • the detected abnormal result data will also be synchronized to the database of the application system, pre-stored in the Mysql (relational database management system) database, the process engine automatically initiates the abnormal process to the corresponding financial manager, and the financial manager can use the abnormal data Perform manual verification and feed back the final judgment result.
  • the quartile is also called the quartile point, which refers to the reduction of all values in statistics from small to Large array and divided into four equal parts, the numerical value at the position of the three dividing points.
  • the first quartile Q1 also known as the "lower quartile”
  • the second quartile Q2 also known as the "median ", equal to the 50% digit after all the values in the sample are arranged in descending order.
  • the third quartile Q3 also known as the "upper quartile" is equal to all the values in the sample arranged in descending order The number 75%.
  • step (7) Send the store, date and product information corresponding to the abnormal value detected in step (6) to the relevant business department.
  • the business department will check with the information of all parties and on-site investigations. If it is determined that there is a risk, it can be carried out by the company's legal department.
  • the next step is to deal with it to avoid greater losses; as shown in Figure 3, the figure shows the inventory risk early warning case of a store from June 2018 to June 2019. From the results in Figure 3, it can be seen that the risk level in January 19 was obvious Above the threshold, it can basically be determined that the store has data abnormalities and large financial risks.
  • the present invention provides a method for early warning of commodity inventory risk based on statistical interquartile range. According to the characteristics of commodity inventory data time series that are easily affected by the macroeconomic situation, seasons, promotional activities, etc., a sliding window is used to count the four points of the sample The number of digits is used to calculate the abnormality detection threshold, so that the abnormal value of the inventory data can be detected more accurately.
  • the method of the present invention has low calculation overhead, short running time of the computer program, and can realize quasi real-time detection.
  • the workload is huge and the efficiency is low.
  • an audit is performed only in a few months or a longer period, and each audit time also needs to last several days or longer; the use of the present invention
  • the T+1 form of once-a-day detection can be realized, and the task execution averages 15 minutes, and the detected possible abnormal data can be pushed to the corresponding financial manager through the process, and the relevant personnel will arrange the targeted review.
  • the results can be fed back on the same day, and the entire process closed-loop of risk discovery, risk early warning, abnormal push, risk review, result feedback, and post-event accountability can be realized, effectively detecting and avoiding abnormal risks in a timely manner.
  • the program can be stored in a judging machine storage medium, and the storage medium can include : Read-only memory ROM, random access memory RAM, magnetic disk or optical disk, etc.
  • the present invention also provides a risk identification system for suspected actual controllers based on a knowledge graph, the system including:
  • the data collection module is used to obtain the original product inventory data of all stores in a certain historical time period from the product inventory database of the enterprise platform;
  • the data processing module performs processing operations on the original product inventory data to obtain the inventory incremental data
  • the early warning module detects whether the new inventory increment exceeds the abnormal detection threshold. If it exceeds, it will be judged as abnormal data and sent to the front-end early warning. Front-end personnel, such as financial personnel, can manually check after receiving the warning information to further confirm the risk.
  • the invention realizes the rapid and accurate detection of the abnormal value of the commodity inventory, and can effectively avoid the abnormal risk in time.
  • the data processing module includes:
  • the data grouping unit groups the original product inventory data according to stores and products
  • the data sorting unit sorts the original product inventory data according to time, such as day as a unit, and fills in the missing data with zero values. For example, if there are no products in stock on a certain day, fill in 0;
  • the difference calculation unit performs a difference operation on the grouped and sorted data, takes the absolute value of the result, and removes all zero values to obtain the final inventory increment data.
  • the amount of data is large, for example, a platform has data of the order of 20 billion, it is basically not feasible to use traditional data analysis directly for differential calculation, and it is difficult to use traditional JAVA or database calculation schemes.
  • spark is used for data processing, using its computing power under a large amount of data and its advantages in an iterative computing scenario, and simultaneously using multiple threads for concurrent processing.
  • the actual initialization only takes a few hours. It can be completed, greatly improving the computing efficiency.
  • the functional modules in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically.
  • the above-mentioned integrated modules, systems, and platforms can be implemented in hardware or software functions. The form of the unit is realized.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于统计四分位距的商品库存风险预警方法、***,及包含该方法的计算机可读存储介质。该方法包括:获取某历史时间段内的所有门店的原始商品库存数据;根据原始商品库存数据计算得到库存增量数据;计算该库存增量数据的上下四分位数,根据上下四分位数计算出四分位距及异常检测阈值;检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。该方法解决了现有技术中计算开销大、数据量大时异常值漏判、时效性低等问题。

Description

基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质 技术领域
本发明属于大数据在零售风控领域中的应用,具体涉及一种基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质。
背景技术
目前大型零售企业的实体门店正在以极快的速度增长,同时由于管理方面的局限性,门店管理人员可能会因为一己私利进行违法操作对企业造成重大经济损失,目前往往更多依靠事后财务审计、盘点等方式去核查问题,时效性很低,往往是发现了问题也很难再完全追回财务损失,因此我们需要找到一个高效、准确的方法通过对销售、财务的指标监控分析能及时发现可疑风险并通知相关人员进行审查核实。
由于财务报表本身的一些滞后性,用户在指标选择是首先排除了通过财务指标进行预警的方案,把指标的选择放在了付款和库存这两个跟销售紧密结合的核心业务指标上面,通过对业务以及***数据的进一步分析发现,在进行一些非法操作时其并不通过公司的销售***,无法在付款中体现,但其必须进行商品的出入库操作,在库存指标中正常体现,因此最终选取对商品库存数据指标的异常检测来及时发现和预警风险。
对于数据异常值检测方法的研究,目前主要集中在无监督异常检测,常用的检测方法包括与统计与概率模型的方法、基于线性模型的方法和基于相似度衡量模型的方法。基于统计学的方法主要有3σ原则和基于箱型图分析的方法等,基于线性模型的方法主要有PCA(主成分)分析法和One-class SVM(支持向量机)等,基于相似度衡量模型的方法主要有k近邻和Isolation Forest(孤立森林)等。由于商品种类繁多,数据量非常大,且商品库存数据属于一维时间序列,基于线性模型和相似度衡量模型的计算开销较大,考虑到计算的实时性,本发明将采用基于统计学的方法。其中3σ原则只适用于服从正态分布的数 据,在3σ原则下,异常值被定义为观察值和平均值的偏差超过3倍标准差的值,P(|x-μ|>3σ)≤0.003,其中μ为均值,σ为标准差,在正态分布假设下,大于3σ的值出现的概率小于0.003,属于小概率事件,故可认定其为异常值。但是在实际情况下,库存数据往往是不服从正态分布的,所以3σ原则并不适用,而箱型图并不限制数据分布,只是直观表现出数据分布的本来面貌。其识别异常值的结果比较客观,而且判断标准以四分位数和四分位距为标准,多达25%的数据可以变得任意远而不会扰动这个标准,鲁棒性更强,但是在样本序列数据量大时,箱型图处理全部数据时容易导致异常点漏判。库存数据属于时间序列,目前很多检测方法没有考虑时间序列的时序变化特性,而是从数据全集考虑,局部的异常值容易漏检,此外库存数据还有自身的一些特性,对于某些品类的商品,在相当长的持续时间内可能会保持不变,也就是存在大量重复数据。
因此,如何设计出一种在数据量较大时,异常数据检测准确、时效性强,便成为了目前亟待解决的问题。
发明内容
基于现有技术的上述缺陷,本发明的目的是提供一种基于统计四分位距的商品库存风险预警方法及***,以克服现有技术中计算开销大、数据量大时异常值漏判、时效性低等问题。
本发明采用的技术方案如下:
一种基于统计四分位距的商品库存风险预警方法,所述方法包括:
获取某历史时间段内的所有门店的原始商品库存数据;
根据原始商品库存数据计算得到库存增量数据;
计算该库存增量数据的上下四分位数,根据上下四分位数计算出四分位距及异常检测阈值;
检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。
进一步地,所述四分位距根据公式IQR=Q3-Q1计算得到,所述异常阈值根据公式MAX=Q3+3×IQR计算得到,其中,Q3为上四分位数,Q1为下四分位数, MAX为阈值。
进一步地,根据原始商品库存数据计算库存增量数据包括步骤:
将原始商品库存数据先按门店和商品进行分组,并按时间排序,对缺失数据用零值填补,得到初步整理好的历史数据;
对初步整理好的历史数据进行差分运算得到初始库存增量数据;
再对初始库存增量数据取绝对值,同时去掉全部零值,得到最终的库存增量数据。
进一步地,所述四分位距的计算过程包括:
将库存增量数据从小到大排序,将第25%的数字作为下四分位数Q1,将第75%的数字作为上四分位数Q3,四分位距IQR=Q3-Q1。
进一步地,所述方法还包括,采用滑动时间窗口模式,每间隔一段时间,重新计算得出新的异常检测阈值。每隔一段时间,比如采用T+1的模式,每间隔一天采集一次最新的库存数据,通过计算最新的异常检测阈值,利用该数据对未来一段时间内的库存数据进行异常判定,提高数据判断的时效性。
进一步地,所述方法还包括,前端收到异常数据推送后,再由业务人员人工进行审查以确定是否为异常数据。在被判定为异常数据后,再通过人工检测,能够更进一步提高判定的精确度。
进一步地,对原始商品数据的分组排序以及差分运算采用spark数据平台进行处理。采用spark平台能够提高计算能力和处理效率。
基于本发明的另一构思,还提供了一种基于知识图谱的疑似实际控制人的风险识别***,所述***包括:
数据采集模块,用于从库存数据库中获取某历史时间段内的所有门店的原始商品库存数据;
数据处理模块,对原始商品库存数据进行处理运算得到库存增量数据;
阈值计算模块,计算该库存增量数据的上下四分位数,并根据上下四分位数计算出四分位距及异常检测阈值;
预警模块,检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。
进一步地,所述数据处理模块包括:
数据分组单元,将原始商品库存数据进行分组;
数据排序单元,将原始商品库存数据按照时间排序,并将缺失数据用零值填补;
差分计算单元,将分组排序好的数据进行差分运算,对结果取绝对值,同时去掉全部零值得到最终的库存增量数据。
本发明还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行本发明中的方法的各个步骤。
与现有技术相比,本发明所公开的一种基于统计四分位距的商品库存风险预警方法及***,具有如下技术效果:
1、本发明利用统计四分位距的方法来计算异常库存增量的阈值,计算效率高,实现了风险快速、精准的定位,对比传统的人工审计、盘点,工作量大大降低,还可以规避人为主观因素导致的差异。
2、本发明通过对后续每天库存数据进行异常监测,在监测超过阈值时主动向用户或前端进行预警,可实现T+1预警模式,每天对库存数据进行异常检测判定,大大提高了库存异常风险发现的时效性。
3、本发明对库存增量数据的处理运算是采用了spark平台,利用其在大数据量下的计算能力以及在迭代计算场景下的优势,同时利用多线程进行并发处理,大大提高了数据的处理效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例中的基于统计四分位距的商品库存风险预警方法的流程示意图。
图2为本发明实施例中的四分位数、四分位距和阈值的设置示意图。
图3为发明实施例中的近一年内某商品库存数量变动及对应的异常检测阈值的统计示意图。
图4为本发明实施例中基于统计四分位距的商品库存风险预警***的架构图。
具体实施方式
为使本领域技术人员更好地理解本发明的技术方案,下面结合附图和具体实施方式对本发明作进一步详细描述。
参照图1~图3所示,本发明实施例所公开了一种基于统计四分位距的商品库存风险预警方法,该方法包括如下步骤:
获取某历史时间段内的所有门店的原始商品库存数据;
具体来说,从商品库存数据库中获取当前日期往前近一段时间的所有门店的商品库存数据,比如以一年时间为基准,统计本月起的前12个月内的数据,此外,商品库存数据库中的数据可以每隔一段时间同步传输到HADOOP集群的HDFS(分布式文件存储)***中,这样可直接从HDFS平台获取。
根据原始商品库存数据计算得到库存增量数据;
具体来说,该步骤包括:
将原始商品库存数据先按门店和商品进行分组,并按时间排序,可以以天为单位,对缺失数据用零值填补,得到初步整理好的历史数据;
对初步整理好的历史数据进行差分运算得到初始库存增量数据;
再对初始库存增量数据取绝对值,同时去掉全部零值,得到最终的库存增量数据。该库存增量数据为历史期间的每天的库存增量。当然,排序时间也可以根据周、月来统计,这样就是每周或每月的库存增量数据。
计算该库存增量数据的上下四分位数,根据上下四分位数计算出四分位距及异常检测阈值;
其中,将库存增量数据从小到大排序,将第25%的数字作为下四分位数Q1, 将第75%的数字作为上四分位数Q3,四分位距IQR=Q3-Q1;异常阈值根据公式MAX=Q3+3×IQR计算得到,其中,Q3为上四分位数,Q1为下四分位数,MAX为阈值。
检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。
在该步骤中,***实时监测新的库存增量,在新的库存增量有变动,且超过阈值时,主动向前端和用户提醒,提示财务人员注意。同时,检测的异常结果数据也会同步到应用***的数据库中,预存在Mysql(关系型数据库管理***)数据库中,流程引擎自动发起异常流程至对应财务负责人,财务负责人可对该异常数据进行人工核查,并反馈最终的判定结果。
下面来通过一个具体的实施例对本发明的方法作一个完整描述。
(1)数据输入,从大数据平台获取当前日期近一年的商品库存数据。
(2)数据预处理,利用spark对原始数据按门店和商品进行分组,按照时间顺序(以天为单位)进行排序,对缺失数据用零值来填补,得到初步整理好的历史数据{a n},n=1,2,3,...,365,n与具体的日期是相对应的。
(3)利用spark对步骤(2)中初步整理好的历史数据{a n}进行差分得到库存增量数据{b n},其中b 0=0,b n=a n-a n-1,n=2,3,...,365。
(4)对差分数据{b n}取绝对值,同时去掉所有零值,得到去掉零值后的库存增量数据{c n}。
(5)计算步骤(4)中库存增量数据{c n}的上下四分位数,四分位数(Quartile)也称四分位点,是指在统计学中把所有数值由小到大排列并分成四等份,处于三个分割点位置的数值。第一四分位数Q1,又称“下四分位数”,等于该样本中所有数值由小到大排列后第25%的数字,第二四分位数Q2,又称“中位数”,等于该样本中所有数值由小到大排列后第50%的数字,第三四分位数Q3,又称“上四分位数”,等于该样本中所有数值由小到大排列后第75%的数字。
(5.1)计算下四分位数Q1,将库存增量数据{c n}从小到大排序得到{d n},可以计算得到Q1的位置p 1=1+(|{d n}|-1)×0.25,其中|{d n}|是数据大小,进一步可以算得下四分位数
Figure PCTCN2020105964-appb-000001
其中符号
Figure PCTCN2020105964-appb-000002
表示向 下取整。
(5.2)计算上四分位数Q3,将库存增量数据{c n}从小到大排序得到{d n},可以计算得到Q3的位置p 3=1+(|{d n}|-1)×0.75,其中|{d n}|是数据大小,进一步可以算得下四分位数
Figure PCTCN2020105964-appb-000003
其中符号
Figure PCTCN2020105964-appb-000004
表示向下取整。
(6)根据步骤(5)中算出的上下四分位数计算四分位距IQR=Q3-Q1,计算异常阈值MAX=Q3+3×IQR,根据计算得到的阈值可以对新的库存增量数据进行检测,超过阈值MAX的认为是异常值;图2中上面的T形盒表示的上限即异常检测阈值MAX,具体检测效果如图3所示,在阈值线以上的数据可以认为是异常数据。
(7)将步骤(6)检测出来的异常值对应的门店、日期和商品等信息发送至相关业务部门,业务部门结合各方信息及实地调查进行核查,如果确定存在风险,可由公司法务部门进行下一步处理,以规避更大的损失;如图3中,该图显示了2018年6月至2019年6月某门店的库存风险预警案例,由图3结果可知,19年1月份风险等级明显高于阈值,基本可以判定该门店存在数据异常和较大财务风险。
(8)每经过一个月时间,重新选取最近一年的库存数据,相当于滑动时间窗口,重复步骤(1)~(7)重新计算阈值,并对未来一个月内的库存数据利用该阈值进行异常检测并及时预警,直到下一次重新计算阈值。
本发明提供的一种基于统计四分位距的商品库存风险预警方法,针对商品库存数据时间序列易受宏观经济形势、季节、促销活动等影响的特点,采用滑动窗口的形式统计样本的四分位数并以此计算异常检测阈值,从而可以更加准确地检测出库存数据的异常值,此外,本发明方法计算开销小,计算机程序运行时间短,可以实现准实时检测。
相比原始的采用人工审计、盘点的方法,工作量巨大,效率低下,一般是数月或者更长周期才进行一次审计,每次的审计时间也需要持续数天或者更长;采用本发明的方法后,可以实现T+1形式的一天一次检测,任务执行平均在15分钟,就可以把检测到的可能异常数据通过流程方式推送给对应财务负责人,由相关***进行针对性的审查,当天就能反馈结果,实现风险发现、风险预警、异常推送、风险审查、结果反馈、事后追责的全流程闭环,有效的及时发现并规避异常风险。
本领域普通技术人员应当理解的是,上述实施例的各种方法中的全部或者部分步骤是可以通过程序指令相关的硬件来完成,该程序可以存储于一判断机存储介质中,存储介质可以包括:只读存储器ROM、随机存取存储器RAM、磁盘或光盘等。
与上述实施例中的方法相对应地,参照图4所示,本发明还提供了一种基于知识图谱的疑似实际控制人的风险识别***,该***包括:
数据采集模块,用于从企业平台自己的商品库存数据库中获取某历史时间段内的所有门店的原始商品库存数据;
数据处理模块,对原始商品库存数据进行处理运算得到库存增量数据;
阈值计算模块,计算该库存增量数据的上下四分位数,并根据上下四分位数计算出四分位距及异常检测阈值,其中,异常阈值根据公式MAX=Q3+3×IQR计算得到,Q3为上四分位数,Q1为下四分位数,MAX为阈值;
预警模块,检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。前端人员,比如财务人员在收到该预警信息后,还可以人工核查,以进一步确认风险。
本发明通过数据采集模块、数据处理模块、阈值计算模块、预警模块的配合,实现了商品库存异常值的快速准确的检测,能够及时有效的规避异常风险。
本实施例中,数据处理模块包括:
数据分组单元,将原始商品库存数据按照门店和商品进行分组;
数据排序单元,将原始商品库存数据按照时间排序,比如以天为单位,并将缺失数据用零值填补,比如某天没有库存商品,则填入0;
差分计算单元,将分组排序好的数据进行差分运算,对结果取绝对值,同时去掉全部零值得到最终的库存增量数据。当数据量较大时,比如某平台有200亿的数量级的数据,如果采用传统的直接利用数据分析工作进行差分计算基本是不可行的,采用传统的JAVA或数据库的计算方案,也很难一次完成全年的计算,需要通过拆分或者循环的方式,同时增加并发去执行,乐观估计可能需要3-4天左右的时间,后续每天增量处理在40分钟左右,考虑到后期计划是按月进行阈值的初始化计算,这个效率远远无法满足需求。本发明实施例中采用spark进行数据的加工处理,利用其在大数据量下的计算能力以及在迭代计算场 景下的优势,同时利用多线程进行并发处理,实际初始化时只需花费几个小时即可完成,大大提高了运算效率。
另外,在本发明实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,上述集成的模块、***、平台既可以采用硬件的方式实现,也可以采用软件功能单元的形式实现。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种基于统计四分位距的商品库存风险预警方法,其特征在于,所述方法包括:
    获取某历史时间段内的所有门店的原始商品库存数据;
    根据原始商品库存数据计算得到库存增量数据;
    计算该库存增量数据的上下四分位数,根据上下四分位数计算出四分位距及异常检测阈值;
    检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。
  2. 根据权利要求1所述的预警方法,其特征在于,所述四分位距根据公式IQR=Q3-Q1计算得到,所述异常阈值根据公式MAX=Q3+3×IQR计算得到,其中,Q3为上四分位数,Q1为下四分位数,MAX为阈值。
  3. 根据权利要求2所述的预警方法,其特征在于,根据原始商品库存数据计算库存增量数据包括步骤:
    将原始商品库存数据先按门店和商品进行分组,并按时间排序,对缺失数据用零值填补,得到初步整理好的历史数据;
    对初步整理好的历史数据进行差分运算得到初始库存增量数据;
    再对初始库存增量数据取绝对值,同时去掉全部零值,得到最终的库存增量数据。
  4. 根据权利要求2所述的预警方法,其特征在于,所述四分位距的计算过程包括:
    将库存增量数据从小到大排序,将第25%的数字作为下四分位数Q1,将第75%的数字作为上四分位数Q3,四分位距IQR=Q3-Q1。
  5. 根据权利要求1所述的预警方法,其特征在于,所述方法还包括,采用滑动时间窗口模式,每间隔一段时间,重新计算得出新的异常检测阈值。
  6. 根据权利要求1或5所述的预警方法,其特征在于,所述方法还包括,前端收到异常数据推送后,再由业务人员人工进行审查以确定是否为 异常数据。
  7. 根据权利要求3所述的预警方法,其特征在于,对原始商品数据的分组排序以及差分运算采用spark数据平台进行处理。
  8. 一种基于统计四分位距的商品库存风险预警***,其特征在于,该***包括:
    数据采集模块,用于从库存数据库中获取某历史时间段内的所有门店的原始商品库存数据;
    数据处理模块,对原始商品库存数据进行处理运算得到库存增量数据;
    阈值计算模块,计算该库存增量数据的上下四分位数,并根据上下四分位数计算出四分位距及异常检测阈值;
    预警模块,检测新的库存增量是否超过异常检测阈值,如超过,则被判定为异常数据并推送给前端预警。
  9. 根据权利要求8所述的预警***,其特征在于,所述数据处理模块包括:
    数据分组单元,将原始商品库存数据进行分组;
    数据排序单元,将原始商品库存数据按照时间排序,并将缺失数据用零值填补;
    差分计算单元,将分组排序好的数据进行差分运算,对结果取绝对值,同时去掉全部零值得到最终的库存增量数据。
  10. 一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,其特征在于,计算机程序被处理器运行时执行上述权利要求1-7任一项所述的方法的步骤。
PCT/CN2020/105964 2019-09-20 2020-07-30 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质 WO2021052031A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3155689A CA3155689A1 (en) 2019-09-20 2020-07-30 Early-warning method for commodity inventory risk based on a statistical interquartile range, and system and computer-readable storage medium thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910892438.8 2019-09-20
CN201910892438.8A CN110751371B (zh) 2019-09-20 2019-09-20 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021052031A1 true WO2021052031A1 (zh) 2021-03-25

Family

ID=69276761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105964 WO2021052031A1 (zh) 2019-09-20 2020-07-30 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质

Country Status (3)

Country Link
CN (1) CN110751371B (zh)
CA (1) CA3155689A1 (zh)
WO (1) WO2021052031A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220327172A1 (en) * 2021-04-09 2022-10-13 Servicenow, Inc. Evaluation and Recommendation Engine for a Remote Network Management Platform
CN115188186A (zh) * 2022-06-28 2022-10-14 公安部交通管理科学研究所 一种区域内交通流量监测方法
CN115858894A (zh) * 2023-02-14 2023-03-28 温州众成科技有限公司 一种可视化的大数据分析方法
CN117454121A (zh) * 2023-12-22 2024-01-26 华能济南黄台发电有限公司 一种基于电厂安全预警的数据分析处理方法及***
CN117556364A (zh) * 2024-01-12 2024-02-13 济南福深兴安科技有限公司 一种矿用矿压安全智能监测***

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751371B (zh) * 2019-09-20 2022-06-07 苏宁云计算有限公司 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质
CN111428886B (zh) * 2020-04-10 2023-08-04 青岛聚好联科技有限公司 一种故障诊断的深度学习模型自适应更新的方法及装置
CN111506625A (zh) * 2020-04-20 2020-08-07 中国建设银行股份有限公司 一种告警阈值确定方法及装置
CN111861410A (zh) * 2020-07-27 2020-10-30 北京百川盈孚科技有限公司 数据变动异常预警方法、***及装置
CN111967345B (zh) * 2020-07-28 2023-10-31 国网上海市电力公司 一种实时判定摄像头遮挡状态的方法
CN112001621A (zh) * 2020-08-21 2020-11-27 广州云徙科技有限公司 一种关键指标的智能预警方法
CN112070408A (zh) * 2020-09-14 2020-12-11 唐山学院 基于大数据和深度学习的烧结矿成分预报模型
CN112116014A (zh) * 2020-09-24 2020-12-22 贵州电网有限责任公司 一种配电自动化设备测试数据离群值检测方法
CN112700085A (zh) * 2020-12-11 2021-04-23 华南理工大学 基于关联规则的复杂***稳态运行参数优化方法、***和介质
CN112768062A (zh) * 2021-01-26 2021-05-07 武汉大学 提高医疗数值型数据矫正效率的方法、***及存储介质
CN113096393A (zh) * 2021-03-29 2021-07-09 中移智行网络科技有限公司 一种路况预警方法、装置及边缘云设备
CN113159597B (zh) * 2021-04-29 2022-08-16 浙江陀曼云计算有限公司 基于时序数据波动的机床生产间隔计算方法及***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100114745A1 (en) * 2008-10-30 2010-05-06 Sap Ag System and method for calculating and applying market data change rate sets
CN103631681A (zh) * 2013-12-10 2014-03-12 国家电网公司 一种在线修复风电场异常数据的方法
CN104915846A (zh) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 一种电子商务时间序列数据的异常检测方法及***
CN110751371A (zh) * 2019-09-20 2020-02-04 苏宁云计算有限公司 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10453142B2 (en) * 2009-02-11 2019-10-22 Johnathan Mun System and method for modeling and quantifying regulatory capital, key risk indicators, probability of default, exposure at default, loss given default, liquidity ratios, and value at risk, within the areas of asset liability management, credit risk, market risk, operational risk, and liquidity risk for banks
CN102339288B (zh) * 2010-07-21 2013-09-25 ***通信集团辽宁有限公司 数据仓库异常数据的检测方法及装置
US10762173B2 (en) * 2011-12-05 2020-09-01 Omnicell, Inc. System and method for managing inventory at dispensing units
CN104636933B (zh) * 2015-02-11 2017-12-12 广州唯品会信息科技有限公司 电子商务网站超卖原因定位的方法及装置
CN104657503A (zh) * 2015-03-13 2015-05-27 浪潮集团有限公司 一种基于统计判别法对电商销售额异常值的预处理方法
CN105808368B (zh) * 2016-03-15 2019-04-30 南京联成科技发展股份有限公司 一种基于随机概率分布的信息安全异常检测的方法及***
CN106407278B (zh) * 2016-08-26 2020-03-17 宝信软件(武汉)有限公司 一种大数据平台的架构设计***
EP3759625A4 (en) * 2018-02-26 2021-11-24 Becton, Dickinson and Company INTERACTIVE VISUAL APPLICATION FOR SECURITY STOCK MODELING
CN108830510B (zh) * 2018-07-16 2021-09-21 国网上海市电力公司 一种基于数理统计的电力数据预处理方法
CN109191283A (zh) * 2018-08-30 2019-01-11 成都数联铭品科技有限公司 风险预警方法及***
CN109978386A (zh) * 2019-03-28 2019-07-05 六安致跃供应链管理有限公司 一种基于供应链的可视化库存***及其可视化管理方法
CN110032670B (zh) * 2019-04-17 2022-11-29 腾讯科技(深圳)有限公司 时序数据的异常检测方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100114745A1 (en) * 2008-10-30 2010-05-06 Sap Ag System and method for calculating and applying market data change rate sets
CN103631681A (zh) * 2013-12-10 2014-03-12 国家电网公司 一种在线修复风电场异常数据的方法
CN104915846A (zh) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 一种电子商务时间序列数据的异常检测方法及***
CN110751371A (zh) * 2019-09-20 2020-02-04 苏宁云计算有限公司 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220327172A1 (en) * 2021-04-09 2022-10-13 Servicenow, Inc. Evaluation and Recommendation Engine for a Remote Network Management Platform
CN115188186A (zh) * 2022-06-28 2022-10-14 公安部交通管理科学研究所 一种区域内交通流量监测方法
CN115188186B (zh) * 2022-06-28 2024-02-20 公安部交通管理科学研究所 一种区域内交通流量监测方法
CN115858894A (zh) * 2023-02-14 2023-03-28 温州众成科技有限公司 一种可视化的大数据分析方法
CN115858894B (zh) * 2023-02-14 2023-05-16 温州众成科技有限公司 一种可视化的大数据分析方法
CN117454121A (zh) * 2023-12-22 2024-01-26 华能济南黄台发电有限公司 一种基于电厂安全预警的数据分析处理方法及***
CN117454121B (zh) * 2023-12-22 2024-04-05 华能济南黄台发电有限公司 一种基于电厂安全预警的数据分析处理方法及***
CN117556364A (zh) * 2024-01-12 2024-02-13 济南福深兴安科技有限公司 一种矿用矿压安全智能监测***
CN117556364B (zh) * 2024-01-12 2024-03-29 济南福深兴安科技有限公司 一种矿用矿压安全智能监测***

Also Published As

Publication number Publication date
CN110751371A (zh) 2020-02-04
CA3155689A1 (en) 2021-03-25
CN110751371B (zh) 2022-06-07

Similar Documents

Publication Publication Date Title
WO2021052031A1 (zh) 基于统计四分位距的商品库存风险预警方法、***及计算机可读存储介质
US11093519B2 (en) Artificial intelligence (AI) based automatic data remediation
CN107810500B (zh) 数据质量分析
WO2019056681A1 (zh) 数据实时监控方法、装置、终端设备及存储介质
US10614073B2 (en) System and method for using data incident based modeling and prediction
US11182394B2 (en) Performing database file management using statistics maintenance and column similarity
WO2020062702A9 (zh) 短信发送的方法、装置、计算机设备和存储介质
CN109934268B (zh) 异常交易检测方法及***
US20150220868A1 (en) Evaluating Data Quality of Clinical Trials
CN107679734A (zh) 一种用于无标签数据分类预测的方法和***
US20210364999A1 (en) System and method for analyzing cause of product defect, computer readable medium
US11954945B2 (en) Systems and methods for analyzing machine performance
CN112465397A (zh) 一种审计数据的分析方法和装置
CN109978675B (zh) 一种税务监控方法和装置
CN115860465A (zh) 一种企业关联数据处理预警方法、***及装置
CN112258095B (zh) 基于标准正态分布的评分方法、装置、设备及存储介质
CN114312930A (zh) 基于日志数据的列车运行异常诊断方法和装置
CN112966897A (zh) 一种基于维修平台的多维度数据分析方法
CN110737700A (zh) 一种基于贝叶斯算法的进销存用户分类方法及***
US20210350305A1 (en) Notifying key performance indicators of interest in a business intelligence system
CN116052887B (zh) 一种过度检查的检测方法、装置、电子设备及存储介质
CN111626586B (zh) 数据质量检测方法、装置、计算机设备和存储介质
US20230334496A1 (en) Automated transaction clustering based on rich, non-human filterable risk elements
CN117290183A (zh) 基于etl实现跨***异常监控处理方法、装置
CN116541565A (zh) 数据图表生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20864908

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3155689

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20864908

Country of ref document: EP

Kind code of ref document: A1