Extract - Stock Datasets

Extract provides a data pipeline for analyzing stock data straight from the redis cache.

Extraction API Examples

Extract All Data for a Ticker

import analysis_engine.extract as ae_extract
print(ae_extract.extract('SPY'))

Extract Latest Minute Pricing for Stocks and Options

import analysis_engine.extract as ae_extract
print(ae_extract.extract(
    'SPY',
    datasets=['minute', 'tdcalls', 'tdputs']))

Extract Historical Data

Extract historical data with the date argument formatted YYYY-MM-DD:

import analysis_engine.extract as ae_extract
print(ae_extract.extract(
    'AAPL',
    datasets=['minute', 'daily', 'financials', 'earnings', 'dividends'],
    date='2019-02-15'))

Additional Extraction APIs

IEX Cloud Extraction API Reference

Tradier Extraction API Reference

analysis_engine.extract.extract(ticker=None, datasets=None, tickers=None, use_key=None, extract_mode='all', iex_datasets=None, date=None, redis_enabled=True, redis_address=None, redis_db=None, redis_password=None, redis_expire=None, s3_enabled=True, s3_address=None, s3_bucket=None, s3_access_key=None, s3_secret_key=None, s3_region_name=None, s3_secure=False, celery_disabled=True, broker_url=None, result_backend=None, label=None, verbose=False)[source]

Extract all cached datasets for a stock ticker or a list of tickers and returns a dictionary. Please make sure the datasets are already cached in Redis before running this method. If not please refer to the analysis_engine.fetch.fetch function to prepare the datasets on your environment.

Python example:

from analysis_engine.extract import extract
d = extract(ticker='NFLX')
print(d)
for k in d['NFLX']:
    print(f'dataset key: {k}')

Extract Intraday Stock and Options Minute Pricing Data

This works by using the date and datasets arguments as filters:

import analysis_engine.extract as ae_extract
print(ae_extract.extract(
    ticker='SPY',
    datasets=['minute', 'tdcalls', 'tdputs'])

This was created for reducing the amount of typying in Jupyter notebooks. It can be set up for use with a distributed engine as well with the optional arguments depending on your connectitivty requirements.

Note

Please ensure Redis and Minio are running before trying to extract tickers

Stock tickers to extract

Parameters:
  • ticker – single stock ticker/symbol/ETF to extract
  • tickers – optional - list of tickers to extract
  • use_key – optional - extract historical key from Redis usually formatted <TICKER>_<date formatted YYYY-MM-DD>

(Optional) Data sources, datafeeds and datasets to gather

Parameters:
  • iex_datasets – list of strings for gathering specific IEX datasets which are set as consts: analysis_engine.iex.consts.FETCH_*.
  • date – optional - string date formatted YYYY-MM-DD - if not set use last close date
  • datasets – list of strings for indicator dataset extraction - preferred method (defaults to BACKUP_DATASETS)

(Optional) Redis connectivity arguments

Parameters:
  • redis_enabled – bool - toggle for auto-caching all datasets in Redis (default is True)
  • redis_address – Redis connection string format: host:port (default is localhost:6379)
  • redis_db – Redis db to use (default is 0)
  • redis_password – optional - Redis password (default is None)
  • redis_expire – optional - Redis expire value (default is None)

(Optional) Minio (S3) connectivity arguments

Parameters:
  • s3_enabled – bool - toggle for auto-archiving on Minio (S3) (default is True)
  • s3_address – Minio S3 connection string format: host:port (default is localhost:9000)
  • s3_bucket – S3 Bucket for storing the artifacts (default is dev) which should be viewable on a browser: http://localhost:9000/minio/dev/
  • s3_access_key – S3 Access key (default is trexaccesskey)
  • s3_secret_key – S3 Secret key (default is trex123321)
  • s3_region_name – S3 region name (default is us-east-1)
  • s3_secure – Transmit using tls encryption (default is False)

(Optional) Celery worker broker connectivity arguments

Parameters:
  • celery_disabled – bool - toggle synchronous mode or publish to an engine connected to the Celery broker and backend (default is True - synchronous mode without an engine or need for a broker or backend for Celery)
  • broker_url – Celery broker url (default is redis://0.0.0.0:6379/13)
  • result_backend – Celery backend url (default is redis://0.0.0.0:6379/14)
  • label – tracking log label

(Optional) Debugging

Parameters:verbose – bool - show extract warnings and other debug logging (default is False)

Supported environment variables

export REDIS_ADDRESS="localhost:6379"
export REDIS_DB="0"
export S3_ADDRESS="localhost:9000"
export S3_BUCKET="dev"
export AWS_ACCESS_KEY_ID="trexaccesskey"
export AWS_SECRET_ACCESS_KEY="trex123321"
export AWS_DEFAULT_REGION="us-east-1"
export S3_SECURE="0"
export WORKER_BROKER_URL="redis://0.0.0.0:6379/13"
export WORKER_BACKEND_URL="redis://0.0.0.0:6379/14"