Extract - Stock Datasets

Extract provides a data pipeline for analyzing stock data straight from the redis cache.

Extraction API Examples

Extract All Data for a Ticker

import analysis_engine.extract as ae_extract

Extract Latest Minute Pricing for Stocks and Options

import analysis_engine.extract as ae_extract
    datasets=['minute', 'tdcalls', 'tdputs']))

Extract Historical Data

Extract historical data with the date argument formatted YYYY-MM-DD:

import analysis_engine.extract as ae_extract
    datasets=['minute', 'daily', 'financials', 'earnings', 'dividends'],

Additional Extraction APIs

IEX Cloud Extraction API Reference

Tradier Extraction API Reference

analysis_engine.extract.extract(ticker=None, datasets=None, tickers=None, use_key=None, extract_mode='all', iex_datasets=None, date=None, redis_enabled=True, redis_address=None, redis_db=None, redis_password=None, redis_expire=None, s3_enabled=True, s3_address=None, s3_bucket=None, s3_access_key=None, s3_secret_key=None, s3_region_name=None, s3_secure=False, celery_disabled=True, broker_url=None, result_backend=None, label=None, verbose=False)[source]

Extract all cached datasets for a stock ticker or a list of tickers and returns a dictionary. Please make sure the datasets are already cached in Redis before running this method. If not please refer to the analysis_engine.fetch.fetch function to prepare the datasets on your environment.

Python example:

from analysis_engine.extract import extract
d = extract(ticker='NFLX')
for k in d['NFLX']:
    print(f'dataset key: {k}')

Extract Intraday Stock and Options Minute Pricing Data

This works by using the date and datasets arguments as filters:

import analysis_engine.extract as ae_extract
    datasets=['minute', 'tdcalls', 'tdputs'])

This was created for reducing the amount of typying in Jupyter notebooks. It can be set up for use with a distributed engine as well with the optional arguments depending on your connectitivty requirements.


Please ensure Redis and Minio are running before trying to extract tickers

Stock tickers to extract

  • ticker – single stock ticker/symbol/ETF to extract
  • tickers – optional - list of tickers to extract
  • use_key – optional - extract historical key from Redis usually formatted <TICKER>_<date formatted YYYY-MM-DD>

(Optional) Data sources, datafeeds and datasets to gather

  • iex_datasets – list of strings for gathering specific IEX datasets which are set as consts: analysis_engine.iex.consts.FETCH_*.
  • date – optional - string date formatted YYYY-MM-DD - if not set use last close date
  • datasets – list of strings for indicator dataset extraction - preferred method (defaults to BACKUP_DATASETS)

(Optional) Redis connectivity arguments

  • redis_enabled – bool - toggle for auto-caching all datasets in Redis (default is True)
  • redis_address – Redis connection string format: host:port (default is localhost:6379)
  • redis_db – Redis db to use (default is 0)
  • redis_password – optional - Redis password (default is None)
  • redis_expire – optional - Redis expire value (default is None)

(Optional) Minio (S3) connectivity arguments

  • s3_enabled – bool - toggle for auto-archiving on Minio (S3) (default is True)
  • s3_address – Minio S3 connection string format: host:port (default is localhost:9000)
  • s3_bucket – S3 Bucket for storing the artifacts (default is dev) which should be viewable on a browser: http://localhost:9000/minio/dev/
  • s3_access_key – S3 Access key (default is trexaccesskey)
  • s3_secret_key – S3 Secret key (default is trex123321)
  • s3_region_name – S3 region name (default is us-east-1)
  • s3_secure – Transmit using tls encryption (default is False)

(Optional) Celery worker broker connectivity arguments

  • celery_disabled – bool - toggle synchronous mode or publish to an engine connected to the Celery broker and backend (default is True - synchronous mode without an engine or need for a broker or backend for Celery)
  • broker_url – Celery broker url (default is redis://
  • result_backend – Celery backend url (default is redis://
  • label – tracking log label

(Optional) Debugging

Parameters:verbose – bool - show extract warnings and other debug logging (default is False)

Supported environment variables

export REDIS_ADDRESS="localhost:6379"
export REDIS_DB="0"
export S3_ADDRESS="localhost:9000"
export S3_BUCKET="dev"
export AWS_ACCESS_KEY_ID="trexaccesskey"
export AWS_SECRET_ACCESS_KEY="trex123321"
export AWS_DEFAULT_REGION="us-east-1"
export S3_SECURE="0"
export WORKER_BROKER_URL="redis://"
export WORKER_BACKEND_URL="redis://"