上一個章節(lecture/Factors.ipynb)我們介紹了何謂因子以及如何使用因子,TQuant Lab 已經內建許多不同因子。然而在因子研究不斷勃發之下,許多新型態價量因子持續問世,或許您也有自己的專屬策略因子,因此本章將示範如何客製化因子並運用於 TQuant Lab 中。
概念上而言,客製化因子與內建因子十分相同。兩者皆以 inputs, window_length, mask 為輸入參數,並且輸出 factor 物件的類別。
假使欲計算每檔股票每天的滾動標準差 (standard deviation),我們可以使用 zipline.pipeline.CustomFactor
子類與 compute
方法函式建構。
inputs: iterable, optional
輸入資料。
outputs: iterable[str], optional
輸出的因子。
window_length: int, optional
輸入資料的時間窗格。
mask: zipline.pipeline.Filter, optional
決定哪些資產需要計算因子。
import os
import pandas as pd
import numpy as np
import tejapi
import warnings
warnings.filterwarnings('ignore')
os.environ['TEJAPI_BASE'] = 'https://api.tej.com.tw'
os.environ['TEJAPI_KEY'] = 'YOUR KEY'
os.environ['mdate'] = '20080401 20230702'
os.environ['ticker'] = '2330 2409'
from zipline.pipeline import Pipeline, CustomFactor
from zipline.TQresearch.tej_pipeline import run_pipeline
from zipline.pipeline.data import TWEquityPricing
from zipline.pipeline.filters import StaticAssets,StaticSids
from zipline.api import sid, symbol
# ingest stock data
!zipline ingest -b tquant
Merging daily equity files:
[2023-10-25 08:14:55.609378] INFO: zipline.data.bundles.core: Ingesting tquant.
於此例我們使用 np.nanstd
計算輸入值的標準差,輸入值與時間區間會依照 make_pipeline()
中的 StdDev
所給的 inputs 與 window_length 所決定。以此例中,若我們想要計算台積電 (2330) 與友達 (2409) 的 7 日收盤價標準差,可以設定為:
接著使用 run_pipeline
呼叫 Pipeline
,於回測期間內,逐日計算因子,最終產出 dataframe。可以發現該dataframe有MultiIndex,分別是時間與標的,並且每個指標於每天都會生成 7 日收盤價標準差。
執行 Pipeline 並生成資料表。
pd.DataFrame, 輸出 Pipeline 執行結果。
class StdDev(CustomFactor):
def compute(self, today, assets, out, values):
out[:] = np.nanstd(values, axis=0)
def make_pipeline():
std_dev = StdDev(inputs=[TWEquityPricing.close], window_length=7)
return Pipeline(
columns={
'std_dev':std_dev
}
)
result = run_pipeline(make_pipeline(), pd.Timestamp('2013-01-03', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))
result
std_dev | ||
---|---|---|
2013-01-03 00:00:00+00:00 | Equity(0 [2330]) | 1.375737 |
Equity(1 [2409]) | 0.350946 | |
2013-01-04 00:00:00+00:00 | Equity(0 [2330]) | 2.024644 |
Equity(1 [2409]) | 0.410947 | |
2013-01-07 00:00:00+00:00 | Equity(0 [2330]) | 2.287053 |
... | ... | ... |
2022-12-29 00:00:00+00:00 | Equity(1 [2409]) | 0.314772 |
2022-12-30 00:00:00+00:00 | Equity(0 [2330]) | 6.326975 |
Equity(1 [2409]) | 0.277562 | |
2023-01-03 00:00:00+00:00 | Equity(0 [2330]) | 6.689163 |
Equity(1 [2409]) | 0.184888 |
4904 rows × 1 columns
當建立客製化因子時,也可以預先設定輸入之參數,於此例中我們欲建立一個計算開收盤價差 10 日平均的因子,在 TenDayMeanDifference
中我們預先宣告 inputs
與 window_length
為 [TWEquityPricing.close, TWEquityPricing.open]
與 window_length = 10
。
class TenDayMeanDifference(CustomFactor):
inputs = [TWEquityPricing.close, TWEquityPricing.open]
window_length = 10
def compute(self, today, assets, out, c_price, o_price):
out[:] = np.nanmean(c_price - o_price, axis=0)
def make_pipeline():
close_open_diff = TenDayMeanDifference()
return Pipeline(
columns={
'close_open_diff':close_open_diff
}
)
result = run_pipeline(make_pipeline(), pd.Timestamp('2013-01-03', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))
result
close_open_diff | ||
---|---|---|
2013-01-03 00:00:00+00:00 | Equity(0 [2330]) | 0.100 |
Equity(1 [2409]) | -0.040 | |
2013-01-04 00:00:00+00:00 | Equity(0 [2330]) | 0.090 |
Equity(1 [2409]) | -0.065 | |
2013-01-07 00:00:00+00:00 | Equity(0 [2330]) | 0.200 |
... | ... | ... |
2022-12-29 00:00:00+00:00 | Equity(1 [2409]) | -0.060 |
2022-12-30 00:00:00+00:00 | Equity(0 [2330]) | -0.150 |
Equity(1 [2409]) | -0.050 | |
2023-01-03 00:00:00+00:00 | Equity(0 [2330]) | -1.250 |
Equity(1 [2409]) | -0.055 |
4904 rows × 1 columns
若在 make_pipeline
中賦予 TenDayMeanDifference
新的參數則會覆蓋掉預設的參數(TWEquityPricing.high
、TWEquityPricing.low
),可以發現下方表格的結果與上方表格不同。
def make_pipeline():
close_open_diff = TenDayMeanDifference(inputs=[TWEquityPricing.high, TWEquityPricing.low])
return Pipeline(
columns={
'close_open_diff':close_open_diff
}
)
result = run_pipeline(make_pipeline(), pd.Timestamp('2013-01-03', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))
result
close_open_diff | ||
---|---|---|
2013-01-03 00:00:00+00:00 | Equity(0 [2330]) | 1.540 |
Equity(1 [2409]) | 0.520 | |
2013-01-04 00:00:00+00:00 | Equity(0 [2330]) | 1.630 |
Equity(1 [2409]) | 0.515 | |
2013-01-07 00:00:00+00:00 | Equity(0 [2330]) | 1.600 |
... | ... | ... |
2022-12-29 00:00:00+00:00 | Equity(1 [2409]) | 0.375 |
2022-12-30 00:00:00+00:00 | Equity(0 [2330]) | 5.850 |
Equity(1 [2409]) | 0.370 | |
2023-01-03 00:00:00+00:00 | Equity(0 [2330]) | 6.100 |
Equity(1 [2409]) | 0.360 |
4904 rows × 1 columns
Pipeline
會在每個交易日計算出因子的真實數值。
請注意因子的時間區間必定是從前一個交易日開始計算,以計算前 10 日最低收盤價格為因子,我們可以建立 TenDaysLowest
。所得出資料表包含每日各股票往前十個日的最低收盤價,以 2023-03-19 為例,在計算因子時就會從 2023-03-18 開始向前推十日。
class TenDaysLowest(CustomFactor):
inputs=[TWEquityPricing.close]
window_length=10
def compute(self, today, assets, out, close):
out[:] = np.nanmin(close, axis=0)
def make_pipeline():
tendl = TenDaysLowest()
return Pipeline(
columns={
'TenDaysLowest':tendl
}
)
results = run_pipeline(make_pipeline(), pd.Timestamp('2013-03-18', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))
results
TenDaysLowest | ||
---|---|---|
2013-03-18 00:00:00+00:00 | Equity(0 [2330]) | 102.00 |
Equity(1 [2409]) | 12.70 | |
2013-03-19 00:00:00+00:00 | Equity(0 [2330]) | 100.50 |
Equity(1 [2409]) | 12.65 | |
2013-03-20 00:00:00+00:00 | Equity(0 [2330]) | 100.00 |
... | ... | ... |
2022-12-29 00:00:00+00:00 | Equity(1 [2409]) | 14.65 |
2022-12-30 00:00:00+00:00 | Equity(0 [2330]) | 446.00 |
Equity(1 [2409]) | 14.65 | |
2023-01-03 00:00:00+00:00 | Equity(0 [2330]) | 446.00 |
Equity(1 [2409]) | 14.65 |
4814 rows × 1 columns
由上表可以發現 2013-03-19 台積電的 TenDaysLowest
為 100.5,而下表可以發現確實從 2013-03-05 到 2013-03-18 之間的最低收盤價為 100.5 而非 2013-03-19 的 100,代表 pipeline 在計算因子時是從前一日開始,避免前視偏誤。
from zipline.data.data_portal import DataPortal, get_bundle
df_bundle = get_bundle(bundle_name='tquant',
calendar_name='TEJ',
start_dt=pd.Timestamp('2013-01-05', tz='UTC'),
end_dt=pd.Timestamp('2023-01-03', tz='UTC'))
df_bundle.loc[(df_bundle['symbol']=='2330') & (df_bundle['date'].between('2013-03-04','2013-03-19'))][["date", 'close']]
date | close | |
---|---|---|
66 | 2013-03-04 00:00:00+00:00 | 102.0 |
68 | 2013-03-05 00:00:00+00:00 | 104.0 |
70 | 2013-03-06 00:00:00+00:00 | 104.0 |
72 | 2013-03-07 00:00:00+00:00 | 103.0 |
74 | 2013-03-08 00:00:00+00:00 | 103.5 |
76 | 2013-03-11 00:00:00+00:00 | 102.0 |
78 | 2013-03-12 00:00:00+00:00 | 102.5 |
80 | 2013-03-13 00:00:00+00:00 | 104.5 |
82 | 2013-03-14 00:00:00+00:00 | 104.0 |
84 | 2013-03-15 00:00:00+00:00 | 103.0 |
86 | 2013-03-18 00:00:00+00:00 | 100.5 |
88 | 2013-03-19 00:00:00+00:00 | 100.0 |