A Multivariate Frequency-Severity Model for Healthcare Data Breaches

时间:2022-12-03         阅读:

光华讲坛——社会名流与企业家论坛第6368

A Multivariate Frequency-Severity Model for Healthcare Data Breaches

主讲人江苏师范大学 赵鹏教授

主持人统计学院 周岭副教授

时间:12月7日 9:30-10:30

举办地点腾讯会议,会议ID:128-575-836

主办单位:统计研究中心 统计学院 科研处

主讲人简介:

赵鹏,江苏师范大学教授、教务处处长,中国科学技术大学、兰州大学兼职博导。担任中国现场统计研究会大数据统计分会副理事长,全国工业统计教学研究会常务理事、青年统计学家协会副会长等。曾获国家“优青”、江苏省“双创人才”、江苏省数学成就奖、江苏省教学成果一等奖(第一完成人)、江苏高校“青蓝工程”优秀教学团队带头人等。担任期刊Commun. Stat.副主编(AE)、《应用概率统计》《数理统计与管理》编委。研究领域为可靠性统计和网络可靠性,在Ann. Appl. Stat., Reliab. Eng. Syst. Saf., Eur. J. Oper. Res., IEEE Trans. Reliab., Adv. Appl. Probab.等重要期刊上发表学术论文100余篇。

内容提要:

Data breaches in healthcare have become a substantial concern in recent years, and cause millions of dollars in financial losses each year. However, an obstacle to studying data breaches in healthcare is the lack of suitable statistical approaches. We develop a novel multivariate frequency-severity framework to analyze breach frequency and the number of affected individuals at the state level. A mixed effects model is developed to model the square root transformed frequency, and the log-gamma distribution is proposed to capture the skewness and heavy tail exhibited by the distribution of numbers of affected individuals. We further discover a positive nonlinear dependence between the transformed frequency and the log-transformed numbers of affected individuals (i.e., severity). Both the in-sample and out-of-sample studies show that the proposed multivariate frequency-severity model that accommodates non-linear dependence has satisfactory fitting and prediction performances.

近年来,医疗保健领域的数据泄露已成为一个重大问题,每年会造成数百万美元的经济损失。然而,研究医疗保健数据泄露的一个困难是缺乏合适的统计方法。主讲人开发了一个新的多元频率-严重性框架来分析州一级的违约频率和受影响的个体数量。主讲人建立了混合效应模型来模拟平方根变换频率,并提出了对数-伽马分布来捕捉受影响个体数量的分布所表现出的偏度和重尾。主讲人进一步发现变换频率与受影响个体的对数变换数(即严重程度)之间存在非线性正相关关系。样本内和样本外研究均表明,主讲人所提出的具有非线性相关性的多元频率-严重性模型有较好的拟合和预测性能。