使用ruptures检测变量关系

原创冯小龙 Garden001

2020年10月07日 01:05

1. rupures是什么

ruptures是用于检测变量变化规律的Python库。他可以去分析和分割非平稳的信号变化。

2. 看看实际运行情况

import matplotlib.pyplot as plt
import ruptures as rpt

# 生成信号
n_samples, dim, sigma = 1000, 3, 4
n_bkps = 4  # 定义有多少分割点
signal, bkps = rpt.pw_constant(n_samples, dim, n_bkps, noise_std=sigma)

# 用rbf模型去检测
algo = rpt.Pelt(model="rbf").fit(signal)
result = algo.predict(pen=10)

# 显示结果
rpt.display(signal, bkps, result)
plt.show()

结果如下：

3. 如何安装

直接通过pip即可安装；

pip install ruptures

4 相关资源

项目地址：https://github.com/deepcharles/ruptures
官方文档在这里 ctruong.perso.math.cnrs.fr/ruptures^[1].
作者的相关论文：Statement of purpose: Truong, C., Oudre, L., & Vayatis, N. (2018). ruptures: change point detection in Python. ArXiv E-Prints arXiv:1801.00826, 1–5, arXiv:1801.00826.

5. 使用介绍

本节说明如何使用已实现的算法。ruptures是面向对象建模的设计原则。变量检测算法我们可以看为从两个基础概念开始，一个是BaseEstimator，另一个是BaseCost。

初始化一个新的estimator：每个变量检测算法都从基类ruptures.base.BaseEstimator继承。他的初始的参数包括了下面几个：

'model'：“ l1”，“ l2”，“normal”，“ rbf”，“ linear”，“ ar”。用于计算近似误差的cost function。
'cost'：检测算法的自定义cost function
'jump'：这里应该是为了把变量点按照固定步长去预测。
'min_size'：两个变化点之间的最小样本数。

如何做预测：主要用到的函数是.fit,predict和fit_predict:

fit：通常将信号作为输入并用算法去拟合
predict：执行变量检测。他会返回一个和每个预测区域相关的索引列表。
fit_predict：依次调用fit和predict的辅助方法。

自定义cost function：自定义cost function，只需创建一个从ruptures.base.BaseCost继承的类，并实现方法.fit（signal）和.error（start，end): 其中，.fit（signal）方法将信号作为输入并设置需要的参数。.error（start，end）接受两个参数“ start”和“ end”，并返回start与end之间的cost。：

6. 示例代码1

这里介绍一个dynamic programming的例子。这个例子使用ruptures.detection.Dynp。它计算给定信号的所有子序列的cost。计算的cost复杂度约为，其中K是变量点数量，n是样本数量。

仅考虑可能的变化点的子样本时，可大大降低计算成本。调用ruptures.detection.Dynp .__ init __（）时，可以通过关键字'min_size'设置变量点之间的最小距离。通过参数“jump”，可以设置每次跳过多少个变量点。

import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt

# 创建模拟数据
n, dim = 500, 3
n_bkps, sigma = 3, 5
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)

# 变量点检测
model = "l1"  # "l2", "rbf"
algo = rpt.Dynp(model=model, min_size=3, jump=5).fit(signal)
my_bkps = algo.predict(n_bkps=3)

# 展示结果
rpt.show.display(signal, bkps, my_bkps, figsize=(10, 6))
plt.show()

这里是用DP的方法寻找最后变量点。在给定一个分段模型的情况下，他会计算最佳的划分方式让误差之后最小化。结果是这样的：

参考资料

[1]

Link to documentation: http://ctruong.perso.math.cnrs.fr/ruptures

继续滑动看下一个