!pip install wrds
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting wrds Downloading wrds-3.1.1-py3-none-any.whl (12 kB) Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from wrds) (1.21.6) Collecting psycopg2-binary Downloading psycopg2_binary-2.9.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB) |████████████████████████████████| 3.0 MB 4.5 MB/s Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from wrds) (1.3.5) Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.7/dist-packages (from wrds) (1.4.36) Collecting mock Downloading mock-4.0.3-py3-none-any.whl (28 kB) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->wrds) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->wrds) (2022.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->wrds) (1.15.0) Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.7/dist-packages (from sqlalchemy->wrds) (1.1.2) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from sqlalchemy->wrds) (4.11.3) Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->sqlalchemy->wrds) (4.2.0) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->sqlalchemy->wrds) (3.8.0) Installing collected packages: psycopg2-binary, mock, wrds Successfully installed mock-4.0.3 psycopg2-binary-2.9.3 wrds-3.1.1
The following series of stand-alone queries represents a basic example Python workflow using WRDS data. The commands in this workflow could be run interactively or submitted via a batch job using Python in the WRDS Cloud, or run locally from your computer using a Jupyter notebook. For this example, we'll use the CRSP Daily Stock File (crsp.dsf
) data library
In real-world Python programming, you would probably use the results of these queries to perform additional statistical analysis in your program. This document is meant as an introductory walkthrough.
First, as with every Python program that intends to connect to WRDS, we must import the wrds module and make our connection:
We must also have set up our pgpass file as described earlier in the documentation.
Note: Class accounts and IPAuth / Daypass accounts are not permitted to access WRDS in this manner and will receive an error if trying this connection. You must have your own, dedicated WRDS account in order to access WRDS from MATLAB.
Let's get started. The initial queries (metadata queries) have dedicated wrds module methods to give you the results you're looking for. Later queries (data queries) will use raw_sql()
exclusively.
import wrds
db = wrds.Connection()
db.list_libraries()
['compbd', 'compgd', 'compnad', 'compsegd', 'crspa', 'evts', 'issm', 'nastraq', 'wrds_lib_internal', 'tass', 'aha', 'estimize', 'toyo', 'levin', 'ahasamp', 'crspm', 'tresgsmp', 'trsdcgs', 'columnar', 'trws', 'lspd', 'twoiq', 'wcai', 'csmar', 'wind', 'wqa', 'tresg', 'markit', 'msrb_all', 'risk', 'custom_jl', 'centris', 'block_all', 'zacks', 'boardex_trial', 'audit', 'comp_na_monthly_all', 'emdb', 'auditsmp', 'boardsmp', 'calcbnch', 'comp_segments_hist', 'wrdsrpts_rep_usage', 'public_all', 'pwt_all', 'msfanly', 'etfg', 'cboe_all', 'wrdsapps_backtest_plus', 'pacap', 'ciqsamp_transcripts', 'ciq', 'tr_ds', 'ppublica', 'preqin', 'crsp_a_stock', 'fisdsamp', 'etfg_samp', 'windsmp', 'rent', 'wrdsapps_evtstudy_int_ginsight', 'compmcur', 'secsamp_all', 'rpna', 'ibes', 'rpa', 'wrdssec_midas', 'sdcsamp', 'cisdmsmp', 'snl', 'sprat', 'ciqsamp', 'cboe', 'sustain', 'trucost', 'sdc', 'fjc_linking', 'otc_endofday', 'optionm', 'compa', 'bank', 'blab', 'block', 'compb', 'compg', 'taqmsec', 'comp_execucomp', 'trsamp_ds_eq', 'contrib_ceo_turnover', 'cisdm', 'clrvt', 'clrvtsmp', 'compsamp', 'trdstrm', 'compm', 'compseg', 'comph', 'compsnap', 'comscore', 'sustainalyticssamp_all', 'ff_all', 'pitchbk', 'crspsamp', 'dmef', 'eursamp', 'contrib_kpss', 'auditsmp_all', 'ciqsamp_pplintel', 'execcomp', 'taqmsamp_all', 'crspq', 'boardex', 'ftse', 'cusipm', 'eureka', 'eventus', 'ftsesamp', 'ginsight', 'ktsamp', 'factset', 'ifgrsamp', 'imssamp', 'gmi', 'fisd', 'ff', 'fssamp', 'govpxsmp', 'hbsamp', 'hbase', 'ibeskpi', 'ifgr', 'ibescorp', 'mrktsamp_msf', 'iri', 'kld', 'govpx', 'hfr', 'lspdsamp', 'ims', 'ktmine', 'taq', 'aha_sample', 'calcbench_trial', 'ppubsamp_d4d', 'iss', 'ciqsamp_common', 'comp_na_daily_all', 'compdcur', 'otc', 'mrktsamp', 'contrib_general', 'preqsamp_all', 'twoiq_samp', 'msrb', 'msrbsamp', 'mrktsamp_cdx', 'phlx_all', 'zacksamp_all', 'fjc_litigation', 'mfl', 'etfgsamp', 'sustsamp', 'phlx', 'ppubsamp', 'preqsamp', 'wrdsapps_evtstudy_lr', 'ciqsamp_keydev', 'crsp_a_ccm', 'crsp', 'tfn', 'wappsamp', 'revere', 'toyosamp', 'zacksamp', 'wrdsrpts', 'eurekahedge_sample', 'ims_obp_trial', 'comp_na_annual_all', 'compsamp_snapshot', 'rq_all', 'comp_bank_daily', 'infogroupsamp_business', 'comp_segments_hist_daily', 'infogroupsamp_residential', 'ktmine_patents_samp', 'djones_all', 'doe_all', 'factsamp_all', 'wrdsapps_eushort', 'wrdsapps_subsidiary', 'wrdsappssamp_all', 'wrdsapps_link_datastream_wscope', 'wrdsapps_patents', 'wrdsapps_link_comp_eushort', 'dmef_all', 'macrofin_comm_trade', 'mrktsamp_cds', 'pitchsmp', 'totalq_all', 'toyosamp_all', 'trsamp_dscom', 'trsamp_dsecon', 'reprisk', 'rpnasamp', 'comp', 'audit_corp_legal', 'trcstsmp', 'trsamp_dsfut', 'wrdsapps_backtest_basic', 'wrdsapps_link_supplychain', 'wrdsapps', 'wrdssec', 'audit_common', 'contrib_intangible_value', 'ciqsamp_capstrct', 'contrib_char_returns', 'tr_dealscan', 'crsp_q_indexhist', 'bvd', 'hfrsamp_hfrdb', 'factsamp_revere', 'frb_all', 'trown', 'hbase_sample', 'secsamp', 'lvnsamp_all', 'optionmsamp_europe', 'optionmsamp_us', 'reprisk_sample', 'risksamp_all', 'snlsamp_fig', 'ravenpack_trial', 'wrdsapps_finratio_ccm', 'trace_standard', 'trace_enhanced', 'bvdsamp', 'contrib', 'wrdsapps_link_crsp_bond', 'djones', 'doe', 'frb', 'hfrsamp', 'lvnsamp', 'macrofin', 'midas', 'omtrial', 'public', 'pwt', 'repsamp', 'risksamp', 'snapsamp', 'snlsamp', 'taqsamp', 'totalq', 'trdssamp', 'fjc', 'trace', 'wrdsapps_link_crsp_factset', 'wrdsapps_link_crsp_taq', 'comp_bank', 'trsamp', 'twoiqsmp', 'audit_audit_comp', 'wrdsapps_finratio', 'dealscan', 'crsp_a_indexes', 'msfinst', 'taqmsamp']
db.list_tables('crsp')
['acti', 'asia', 'asib', 'asic', 'asio', 'asix', 'bmdebt', 'bmheader', 'bmpaymts', 'bmquotes', 'bmyield', 'bndprt06', 'bndprt12', 'bxcalind', 'bxdlyind', 'bxmthind', 'bxquotes', 'bxyield', 'cap', 'ccm_lookup', 'ccm_qvards', 'ccmxpf_linktable', 'ccmxpf_lnkhist', 'ccmxpf_lnkrng', 'ccmxpf_lnkused', 'comphead', 'comphist', 'compmaster', 'contact_info', 'crsp_cik_map', 'crsp_daily_data', 'crsp_header', 'crsp_monthly_data', 'crsp_names', 'crsp_portno_map', 'crsp_ziman_daily_index', 'crsp_ziman_monthly_index', 'cs20yr', 'cs5yr', 'cs90d', 'cst_hist', 'daily_nav', 'daily_nav_ret', 'daily_returns', 'dividends', 'dport1', 'dport2', 'dport3', 'dport4', 'dport5', 'dport6', 'dport7', 'dport8', 'dport9', 'dsbc', 'dsbo', 'dse', 'dse62', 'dse62delist', 'dse62dist', 'dse62exchdates', 'dse62names', 'dse62nasdin', 'dse62shares', 'dseall', 'dseall62', 'dsedelist', 'dsedist', 'dseexchdates', 'dsenames', 'dsenasdin', 'dseshares', 'dsf', 'dsf62', 'dsfhdr', 'dsfhdr62', 'dsi', 'dsi62', 'dsia', 'dsib', 'dsic', 'dsio', 'dsir', 'dsix', 'dsiy', 'dsp500', 'dsp500list', 'dsp500p', 'dssc', 'dsso', 'eod_cap', 'eod_sector', 'eod_vg', 'erdport1', 'erdport2', 'erdport3', 'erdport4', 'erdport5', 'erdport6', 'erdport7', 'erdport8', 'erdport9', 'ermport1', 'ermport2', 'ermport3', 'ermport4', 'ermport5', 'fbpri', 'fbyld', 'front_load', 'front_load_det', 'front_load_grp', 'fund_fees', 'fund_flows', 'fund_hdr', 'fund_hdr_hist', 'fund_names', 'fund_style', 'fund_summary', 'fund_summary2', 'fwdask06', 'fwdask12', 'fwdave06', 'fwdave12', 'fwdbid06', 'fwdbid12', 'hldask06', 'hldask12', 'hldave06', 'hldave12', 'hldbid06', 'hldbid12', 'holdings', 'holdings_co_info', 'index_descriptions', 'index_type_map', 'mbi', 'mbmdat', 'mbmhdr', 'mbx', 'mbxid', 'mcti', 'mfdbname', 'mhista', 'mhistn', 'mhistq', 'monthly_nav', 'monthly_returns', 'monthly_tna', 'monthly_tna_ret_nav', 'mport1', 'mport2', 'mport3', 'mport4', 'mport5', 'mse', 'mse62', 'mse62delist', 'mse62dist', 'mse62exchdates', 'mse62names', 'mse62nasdin', 'mse62shares', 'mseall', 'mseall62', 'msedelist', 'msedist', 'mseexchdates', 'msenames', 'msenasdin', 'mseshares', 'msf', 'msf62', 'msfhdr', 'msfhdr62', 'msi', 'msi62', 'msia', 'msib', 'msic', 'msio', 'msir', 'msix', 'msiy', 'msp500', 'msp500list', 'msp500p', 'portnomap', 'priask06', 'priask12', 'priave06', 'priave12', 'pribid06', 'pribid12', 'price_type', 'property_type', 'qcti', 'qsia', 'qsib', 'qsic', 'qsio', 'qsix', 'rear_load', 'rear_load_det', 'rear_load_grp', 'rebala', 'rebaln', 'rebalq', 'reit_type', 'riskfree', 's6z_agg_ann', 's6z_agg_mth', 's6z_agg_qtr', 's6z_del', 's6z_dind', 's6z_dis', 's6z_dp_dly', 's6z_ds_dly', 's6z_hdr', 's6z_indhdr', 's6z_mdel', 's6z_mind', 's6z_mth', 's6z_nam', 's6z_ndi', 's6z_shr', 'saz_agg_ann', 'saz_agg_mth', 'saz_agg_qtr', 'saz_del', 'saz_dind', 'saz_dis', 'saz_dp_dly', 'saz_ds_dly', 'saz_hdr', 'saz_indhdr', 'saz_mdel', 'saz_mind', 'saz_mth', 'saz_nam', 'saz_ndi', 'saz_shr', 'sechead', 'sechist', 'sector', 'sfz_dind', 'sfz_indhdr', 'sfz_mbr', 'sfz_mind', 'sfz_portd', 'sfz_portm', 'sfz_rb', 'stock_qvards', 'stocknames', 'stocknames62', 'sub_property_type', 'tfz_dly', 'tfz_dly_cd', 'tfz_dly_cpi', 'tfz_dly_ft', 'tfz_dly_rf2', 'tfz_dly_ts2', 'tfz_idx', 'tfz_iss', 'tfz_mast', 'tfz_mth', 'tfz_mth_bp', 'tfz_mth_cd', 'tfz_mth_cpi', 'tfz_mth_fb', 'tfz_mth_ft', 'tfz_mth_rf', 'tfz_mth_rf2', 'tfz_mth_ts', 'tfz_mth_ts2', 'tfz_pay', 'vg', 'yldask06', 'yldask12', 'yldave06', 'yldave12', 'yldbid06', 'yldbid12', 'ziman_reit_info', 'zr_hdrnames']
dsf
) dataset, and take a look at the list of available data variables (column names) in that dataset:db.describe_table('crsp', 'dsf')
Approximately 101070400 rows in crsp.dsf.
name | nullable | type | |
---|---|---|---|
0 | cusip | True | VARCHAR(8) |
1 | permno | True | DOUBLE_PRECISION |
2 | permco | True | DOUBLE_PRECISION |
3 | issuno | True | DOUBLE_PRECISION |
4 | hexcd | True | DOUBLE_PRECISION |
5 | hsiccd | True | DOUBLE_PRECISION |
6 | date | True | DATE |
7 | bidlo | True | DOUBLE_PRECISION |
8 | askhi | True | DOUBLE_PRECISION |
9 | prc | True | DOUBLE_PRECISION |
10 | vol | True | DOUBLE_PRECISION |
11 | ret | True | DOUBLE_PRECISION |
12 | bid | True | DOUBLE_PRECISION |
13 | ask | True | DOUBLE_PRECISION |
14 | shrout | True | DOUBLE_PRECISION |
15 | cfacpr | True | DOUBLE_PRECISION |
16 | cfacshr | True | DOUBLE_PRECISION |
17 | openprc | True | DOUBLE_PRECISION |
18 | numtrd | True | DOUBLE_PRECISION |
19 | retx | True | DOUBLE_PRECISION |
db.describe_table('crsp', 'dsi')
Approximately 25360 rows in crsp.dsi.
name | nullable | type | |
---|---|---|---|
0 | date | True | DATE |
1 | vwretd | True | DOUBLE_PRECISION |
2 | vwretx | True | DOUBLE_PRECISION |
3 | ewretd | True | DOUBLE_PRECISION |
4 | ewretx | True | DOUBLE_PRECISION |
5 | sprtrn | True | DOUBLE_PRECISION |
6 | spindx | True | DOUBLE_PRECISION |
7 | totval | True | DOUBLE_PRECISION |
8 | totcnt | True | DOUBLE_PRECISION |
9 | usdval | True | DOUBLE_PRECISION |
10 | usdcnt | True | DOUBLE_PRECISION |
db.describe_table('crsp', 'crsp_header')
Approximately 673 rows in crsp.crsp_header.
name | nullable | type | |
---|---|---|---|
0 | permno | True | DOUBLE_PRECISION |
1 | permco | True | DOUBLE_PRECISION |
2 | begdt | True | DATE |
3 | enddt | True | DATE |
4 | comnam | True | VARCHAR(64) |
5 | hdlstcd | True | DOUBLE_PRECISION |
db.describe_table('crsp', 'comphead')
Approximately 51232 rows in crsp.comphead.
name | nullable | type | |
---|---|---|---|
0 | gvkey | True | VARCHAR(6) |
1 | cik | True | VARCHAR(10) |
2 | ein | True | VARCHAR(10) |
3 | conm | True | VARCHAR(256) |
4 | costat | True | VARCHAR(1) |
5 | dlrsn | True | VARCHAR(8) |
6 | priusa | True | VARCHAR(8) |
7 | prican | True | VARCHAR(8) |
8 | prirow | True | VARCHAR(8) |
9 | idbflag | True | VARCHAR(1) |
10 | fic | True | VARCHAR(3) |
11 | loc | True | VARCHAR(3) |
12 | incorp | True | VARCHAR(8) |
13 | state | True | VARCHAR(8) |
14 | county | True | VARCHAR(100) |
15 | city | True | VARCHAR(104) |
16 | naics | True | VARCHAR(8) |
17 | conml | True | VARCHAR(100) |
18 | weburl | True | VARCHAR(60) |
19 | phone | True | VARCHAR(18) |
20 | fax | True | VARCHAR(18) |
21 | add1 | True | VARCHAR(65) |
22 | add2 | True | VARCHAR(65) |
23 | add3 | True | VARCHAR(65) |
24 | add4 | True | VARCHAR(65) |
25 | addzip | True | VARCHAR(24) |
26 | busdesc | True | VARCHAR(2000) |
27 | ipodate | True | DATE |
28 | dldte | True | DATE |
29 | stko | True | DOUBLE_PRECISION |
30 | fyrc | True | DOUBLE_PRECISION |
31 | sic | True | DOUBLE_PRECISION |
32 | gsector | True | DOUBLE_PRECISION |
33 | ggroup | True | DOUBLE_PRECISION |
34 | gind | True | DOUBLE_PRECISION |
35 | gsubind | True | DOUBLE_PRECISION |
36 | spcindcd | True | DOUBLE_PRECISION |
37 | spcseccd | True | DOUBLE_PRECISION |
db.describe_table('crsp', 'crsp_names')
Approximately 2642 rows in crsp.crsp_names.
name | nullable | type | |
---|---|---|---|
0 | permno | True | DOUBLE_PRECISION |
1 | namebegdt | True | DATE |
2 | nameenddt | True | DATE |
3 | ticker | True | VARCHAR(8) |
4 | shrcls | True | VARCHAR(4) |
5 | comnam | True | VARCHAR(64) |
6 | exchcd | True | DOUBLE_PRECISION |
7 | shrcd | True | DOUBLE_PRECISION |
8 | siccd | True | DOUBLE_PRECISION |
9 | primexch | True | VARCHAR(1) |
10 | tsymbol | True | VARCHAR(12) |
11 | secstat | True | VARCHAR(1) |
12 | trdstat | True | VARCHAR(1) |
13 | namedesc | True | VARCHAR(24) |
14 | issuercd | True | VARCHAR(1) |
15 | convcd | True | VARCHAR(1) |
db.describe_table('crsp', 'stocknames')
Approximately 77779 rows in crsp.stocknames.
name | nullable | type | |
---|---|---|---|
0 | permno | True | DOUBLE_PRECISION |
1 | namedt | True | DATE |
2 | nameenddt | True | DATE |
3 | shrcd | True | DOUBLE_PRECISION |
4 | exchcd | True | DOUBLE_PRECISION |
5 | siccd | True | DOUBLE_PRECISION |
6 | ncusip | True | VARCHAR(8) |
7 | ticker | True | VARCHAR(8) |
8 | comnam | True | VARCHAR(35) |
9 | shrcls | True | VARCHAR(4) |
10 | permco | True | DOUBLE_PRECISION |
11 | hexcd | True | DOUBLE_PRECISION |
12 | cusip | True | VARCHAR(8) |
13 | st_date | True | DATE |
14 | end_date | True | DATE |
15 | namedum | True | DOUBLE_PRECISION |
stocknames=db.get_table('crsp', 'stocknames')
stocknames
permno | namedt | nameenddt | shrcd | exchcd | siccd | ncusip | ticker | comnam | shrcls | permco | hexcd | cusip | st_date | end_date | namedum | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10000.0 | 1986-01-07 | 1987-06-11 | 10.0 | 3.0 | 3990.0 | 68391610 | OMFGA | OPTIMUM MANUFACTURING INC | A | 7952.0 | 3.0 | 68391610 | 1986-01-07 | 1987-06-11 | 2.0 |
1 | 10001.0 | 1986-01-09 | 1993-11-21 | 11.0 | 3.0 | 4920.0 | 39040610 | GFGC | GREAT FALLS GAS CO | None | 7953.0 | 2.0 | 36720410 | 1986-01-09 | 2017-08-03 | 2.0 |
2 | 10001.0 | 1993-11-22 | 2008-02-04 | 11.0 | 3.0 | 4920.0 | 29274A10 | EWST | ENERGY WEST INC | None | 7953.0 | 2.0 | 36720410 | 1986-01-09 | 2017-08-03 | 2.0 |
3 | 10001.0 | 2008-02-05 | 2009-08-03 | 11.0 | 3.0 | 4920.0 | 29274A20 | EWST | ENERGY WEST INC | None | 7953.0 | 2.0 | 36720410 | 1986-01-09 | 2017-08-03 | 2.0 |
4 | 10001.0 | 2009-08-04 | 2009-12-17 | 11.0 | 3.0 | 4920.0 | 29269V10 | EGAS | ENERGY INC | None | 7953.0 | 2.0 | 36720410 | 1986-01-09 | 2017-08-03 | 2.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
77774 | 93433.0 | 2013-04-24 | 2016-12-22 | 11.0 | 3.0 | 9999.0 | 92870X30 | VLTC | VOLTARI CORP | None | 53451.0 | 3.0 | 92870X30 | 2010-06-18 | 2016-12-22 | 2.0 |
77775 | 93434.0 | 2010-06-14 | 2022-03-31 | 11.0 | 3.0 | 9999.0 | 78513510 | SANW | S & W SEED CO | None | 53427.0 | 3.0 | 78513510 | 2010-06-14 | 2022-03-31 | 2.0 |
77776 | 93435.0 | 2010-06-14 | 2012-05-18 | 11.0 | 3.0 | 6163.0 | 82936G20 | SCEI | SINO CLEAN ENERGY INC | None | 53452.0 | 3.0 | 82936G20 | 2010-06-14 | 2012-05-18 | 2.0 |
77777 | 93436.0 | 2010-06-29 | 2017-02-01 | 11.0 | 3.0 | 9999.0 | 88160R10 | TSLA | TESLA MOTORS INC | None | 53453.0 | 3.0 | 88160R10 | 2010-06-29 | 2022-03-31 | 2.0 |
77778 | 93436.0 | 2017-02-02 | 2022-03-31 | 11.0 | 3.0 | 9999.0 | 88160R10 | TSLA | TESLA INC | None | 53453.0 | 3.0 | 88160R10 | 2010-06-29 | 2022-03-31 | 2.0 |
77779 rows × 16 columns
db
<wrds.sql.Connection at 0x7f3a1a602950>
stocknames.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 77779 entries, 0 to 77778 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 permno 77779 non-null float64 1 namedt 77779 non-null object 2 nameenddt 77779 non-null object 3 shrcd 77779 non-null float64 4 exchcd 77779 non-null float64 5 siccd 77779 non-null float64 6 ncusip 72335 non-null object 7 ticker 72881 non-null object 8 comnam 77779 non-null object 9 shrcls 5108 non-null object 10 permco 77779 non-null float64 11 hexcd 77779 non-null float64 12 cusip 77779 non-null object 13 st_date 77779 non-null object 14 end_date 77779 non-null object 15 namedum 77779 non-null float64 dtypes: float64(7), object(9) memory usage: 9.5+ MB
from google.colab import drive
drive.mount("/content/gdrive")
Mounted at /content/gdrive
cd /content/gdrive/My Drive/Colab Notebooks/
/content/gdrive/My Drive/Colab Notebooks
stocknames.to_csv('crsp_stocknames.csv')
ls
Both get_table() and raw_sql() can be used for this:
db.get_table('crsp', 'dsf', obs=100)
cusip | permno | permco | issuno | hexcd | hsiccd | date | bidlo | askhi | prc | vol | ret | bid | ask | shrout | cfacpr | cfacshr | openprc | numtrd | retx | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-05 | 0.5000 | 0.6090 | 0.5999 | 9986.0 | -0.045961 | 0.5999 | 0.6000 | 9296.0 | 1.0 | 1.0 | 0.5000 | 39.0 | -0.045961 |
1 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-08 | 0.4250 | 0.6010 | 0.5700 | 31417.0 | -0.049842 | 0.5699 | 0.5760 | 9296.0 | 1.0 | 1.0 | 0.6000 | 53.0 | -0.049842 |
2 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-09 | 0.5101 | 0.5788 | 0.5788 | 9362.0 | 0.015439 | 0.5400 | 0.5770 | 9296.0 | 1.0 | 1.0 | 0.5200 | 35.0 | 0.015439 |
3 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-10 | 0.5419 | 0.6000 | 0.5700 | 3878.0 | -0.015204 | 0.5699 | 0.5700 | 9296.0 | 1.0 | 1.0 | 0.6000 | 14.0 | -0.015204 |
4 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-11 | 0.5200 | 0.5600 | 0.5299 | 7930.0 | -0.070351 | 0.5299 | 0.5300 | 9296.0 | 1.0 | 1.0 | 0.5600 | 21.0 | -0.070351 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-21 | 18.8125 | 19.3125 | 19.0000 | 340136.0 | 0.006623 | 18.9375 | 19.1875 | 56689.0 | 1.0 | 1.0 | 19.0000 | 76.0 | 0.006623 |
96 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-24 | 18.6250 | 19.2500 | 18.7500 | 173722.0 | -0.013158 | 18.7500 | 18.8750 | 56689.0 | 1.0 | 1.0 | 19.0000 | 51.0 | -0.013158 |
97 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-25 | 18.1250 | 18.7500 | 18.1875 | 322610.0 | -0.030000 | 18.0000 | 18.1875 | 56689.0 | 1.0 | 1.0 | 18.7500 | 98.0 | -0.030000 |
98 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-26 | 17.8750 | 18.2500 | 18.0625 | 357611.0 | -0.006873 | 18.0000 | 18.0625 | 56689.0 | 1.0 | 1.0 | 18.0000 | 130.0 | -0.006873 |
99 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-27 | 17.8125 | 18.1875 | 17.9375 | 50878.0 | -0.006920 | 17.8125 | 17.9375 | 56689.0 | 1.0 | 1.0 | 18.1875 | 42.0 | -0.006920 |
100 rows × 20 columns
db.raw_sql('select * from crsp.dsf LIMIT 100')
cusip | permno | permco | issuno | hexcd | hsiccd | date | bidlo | askhi | prc | vol | ret | bid | ask | shrout | cfacpr | cfacshr | openprc | numtrd | retx | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-05 | 0.5000 | 0.6090 | 0.5999 | 9986.0 | -0.045961 | 0.5999 | 0.6000 | 9296.0 | 1.0 | 1.0 | 0.5000 | 39.0 | -0.045961 |
1 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-08 | 0.4250 | 0.6010 | 0.5700 | 31417.0 | -0.049842 | 0.5699 | 0.5760 | 9296.0 | 1.0 | 1.0 | 0.6000 | 53.0 | -0.049842 |
2 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-09 | 0.5101 | 0.5788 | 0.5788 | 9362.0 | 0.015439 | 0.5400 | 0.5770 | 9296.0 | 1.0 | 1.0 | 0.5200 | 35.0 | 0.015439 |
3 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-10 | 0.5419 | 0.6000 | 0.5700 | 3878.0 | -0.015204 | 0.5699 | 0.5700 | 9296.0 | 1.0 | 1.0 | 0.6000 | 14.0 | -0.015204 |
4 | 75747910 | 88371.0 | 37740.0 | 23794.0 | 3.0 | 7379.0 | 2016-02-11 | 0.5200 | 0.5600 | 0.5299 | 7930.0 | -0.070351 | 0.5299 | 0.5300 | 9296.0 | 1.0 | 1.0 | 0.5600 | 21.0 | -0.070351 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-21 | 18.8125 | 19.3125 | 19.0000 | 340136.0 | 0.006623 | 18.9375 | 19.1875 | 56689.0 | 1.0 | 1.0 | 19.0000 | 76.0 | 0.006623 |
96 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-24 | 18.6250 | 19.2500 | 18.7500 | 173722.0 | -0.013158 | 18.7500 | 18.8750 | 56689.0 | 1.0 | 1.0 | 19.0000 | 51.0 | -0.013158 |
97 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-25 | 18.1250 | 18.7500 | 18.1875 | 322610.0 | -0.030000 | 18.0000 | 18.1875 | 56689.0 | 1.0 | 1.0 | 18.7500 | 98.0 | -0.030000 |
98 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-26 | 17.8750 | 18.2500 | 18.0625 | 357611.0 | -0.006873 | 18.0000 | 18.0625 | 56689.0 | 1.0 | 1.0 | 18.0000 | 130.0 | -0.006873 |
99 | 91705J20 | 88372.0 | 16512.0 | 23775.0 | 3.0 | 4832.0 | 2000-07-27 | 17.8125 | 18.1875 | 17.9375 | 50878.0 | -0.006920 | 17.8125 | 17.9375 | 56689.0 | 1.0 | 1.0 | 18.1875 | 42.0 | -0.006920 |
100 rows × 20 columns
db.get_table('crsp', 'dsf', columns=['cusip, permno, date, bidlo, askhi'], obs=100, )
cusip | permno | date | bidlo | askhi | |
---|---|---|---|---|---|
0 | 75747910 | 88371.0 | 2016-02-05 | 0.5000 | 0.6090 |
1 | 75747910 | 88371.0 | 2016-02-08 | 0.4250 | 0.6010 |
2 | 75747910 | 88371.0 | 2016-02-09 | 0.5101 | 0.5788 |
3 | 75747910 | 88371.0 | 2016-02-10 | 0.5419 | 0.6000 |
4 | 75747910 | 88371.0 | 2016-02-11 | 0.5200 | 0.5600 |
... | ... | ... | ... | ... | ... |
95 | 91705J20 | 88372.0 | 2000-07-21 | 18.8125 | 19.3125 |
96 | 91705J20 | 88372.0 | 2000-07-24 | 18.6250 | 19.2500 |
97 | 91705J20 | 88372.0 | 2000-07-25 | 18.1250 | 18.7500 |
98 | 91705J20 | 88372.0 | 2000-07-26 | 17.8750 | 18.2500 |
99 | 91705J20 | 88372.0 | 2000-07-27 | 17.8125 | 18.1875 |
100 rows × 5 columns
db.raw_sql('select cusip, permno, date, bidlo, askhi from crsp.dsf LIMIT 100')
cusip | permno | date | bidlo | askhi | |
---|---|---|---|---|---|
0 | 75747910 | 88371.0 | 2016-02-05 | 0.5000 | 0.6090 |
1 | 75747910 | 88371.0 | 2016-02-08 | 0.4250 | 0.6010 |
2 | 75747910 | 88371.0 | 2016-02-09 | 0.5101 | 0.5788 |
3 | 75747910 | 88371.0 | 2016-02-10 | 0.5419 | 0.6000 |
4 | 75747910 | 88371.0 | 2016-02-11 | 0.5200 | 0.5600 |
... | ... | ... | ... | ... | ... |
95 | 91705J20 | 88372.0 | 2000-07-21 | 18.8125 | 19.3125 |
96 | 91705J20 | 88372.0 | 2000-07-24 | 18.6250 | 19.2500 |
97 | 91705J20 | 88372.0 | 2000-07-25 | 18.1250 | 18.7500 |
98 | 91705J20 | 88372.0 | 2000-07-26 | 17.8750 | 18.2500 |
99 | 91705J20 | 88372.0 | 2000-07-27 | 17.8125 | 18.1875 |
100 rows × 5 columns
Since you are now filtering by specific values for variables, use raw_sql() exclusively. All future examples will be shown using raw_sql().
NOTE: Use single quotation marks around the date value and double quotation marks elsewhere as shown below.
db.raw_sql("select cusip, permno, date, bidlo, askhi "
"from crsp.dsf "
"where permno in (14593, 90319, 12490, 17778) "
"and date='2013-01-04'")
cusip | permno | date | bidlo | askhi | |
---|---|---|---|---|---|
0 | 45920010 | 12490.0 | 2013-01-04 | 192.779999 | 194.460007 |
1 | 03783310 | 14593.0 | 2013-01-04 | 525.828613 | 538.629883 |
2 | 08467010 | 17778.0 | 2013-01-04 | 140047.000000 | 141003.796875 |
3 | 02079K30 | 90319.0 | 2013-01-04 | 727.680115 | 741.469971 |
db.raw_sql("select cusip,permno,date,bidlo,askhi from crsp.dsf where permno in (14593, 90319, 12490, 17778) and date between '2010-01-01' and '2013-12-31' and askhi > 2000")
cusip | permno | date | bidlo | askhi | |
---|---|---|---|---|---|
0 | 08467010 | 17778.0 | 2010-01-04 | 99201.00000 | 99910.0 |
1 | 08467010 | 17778.0 | 2010-01-05 | 99550.00000 | 100001.0 |
2 | 08467010 | 17778.0 | 2010-01-06 | 99500.00000 | 100000.0 |
3 | 08467010 | 17778.0 | 2010-01-07 | 99594.00000 | 100000.0 |
4 | 08467010 | 17778.0 | 2010-01-08 | 99700.00000 | 100300.0 |
... | ... | ... | ... | ... | ... |
1001 | 08467010 | 17778.0 | 2013-12-24 | 175555.09375 | 176063.0 |
1002 | 08467010 | 17778.0 | 2013-12-26 | 175610.00000 | 176900.0 |
1003 | 08467010 | 17778.0 | 2013-12-27 | 176678.00000 | 177320.0 |
1004 | 08467010 | 17778.0 | 2013-12-30 | 176655.00000 | 177685.0 |
1005 | 08467010 | 17778.0 | 2013-12-31 | 176793.00000 | 177950.0 |
1006 rows × 5 columns
db.raw_sql('select distinct permno from crsp.dsf where askhi > 2000')
permno | |
---|---|
0 | 14542.0 |
1 | 14752.0 |
2 | 15395.0 |
3 | 16280.0 |
4 | 17486.0 |
5 | 17778.0 |
6 | 17881.0 |
7 | 21709.0 |
8 | 36281.0 |
9 | 46068.0 |
10 | 76605.0 |
11 | 79785.0 |
12 | 83443.0 |
13 | 84788.0 |
14 | 86783.0 |
15 | 90319.0 |
16 | 92221.0 |
17 | 93436.0 |
db.raw_sql("select distinct date,permno from crsp.dsf where askhi > 2000 order by date")
date | permno | |
---|---|---|
0 | 1926-12-23 | 14752.0 |
1 | 1926-12-24 | 14752.0 |
2 | 1927-01-07 | 14752.0 |
3 | 1927-01-08 | 14752.0 |
4 | 1948-05-15 | 16280.0 |
... | ... | ... |
17986 | 2022-03-31 | 76605.0 |
17987 | 2022-03-31 | 79785.0 |
17988 | 2022-03-31 | 84788.0 |
17989 | 2022-03-31 | 86783.0 |
17990 | 2022-03-31 | 90319.0 |
17991 rows × 2 columns
db.raw_sql('select permno,askhi,date from crsp.dsf where askhi > 2000 order by askhi desc LIMIT 1')
permno | askhi | date | |
---|---|---|---|
0 | 17778.0 | 544389.25 | 2022-03-29 |
This is one example of how you might approach an analytical task using Python. It begins by gathering metadata information using list_libraries() and list_tables() to learn more about the data structure available, and then using that information to run meaningful queries from the data itself using get_table() and raw_sql()..
A common next step would be to write a batch program that uses the above one-off queries together. An example might be a program that uses a loop to iterate over each permno that has ever posted an Ask Price over $2000 and to calculate how long the date range was able to maintain that height. Or perhaps certain dates were more prolific than others - tallying the number of high asks per date might be informative.