Source code for analysis_engine.extract
"""
**Extraction API Examples**
**Extract All Data for a Ticker**
.. code-block:: python
import analysis_engine.extract as ae_extract
print(ae_extract.extract('SPY'))
**Extract Latest Minute Pricing for Stocks and Options**
.. code-block:: python
import analysis_engine.extract as ae_extract
print(ae_extract.extract(
'SPY',
datasets=['minute', 'tdcalls', 'tdputs']))
**Extract Historical Data**
Extract historical data with the ``date`` argument formatted ``YYYY-MM-DD``:
.. code-block:: python
import analysis_engine.extract as ae_extract
print(ae_extract.extract(
'AAPL',
datasets=['minute', 'daily', 'financials', 'earnings', 'dividends'],
date='2019-02-15'))
**Additional Extraction APIs**
`IEX Cloud Extraction API Reference <https://stock-analysis-engine.
readthedocs.io/en/latest/iex_api.html#iex-extraction-api-reference>`__
`Tradier Extraction API Reference <https://stock-analysis-engine.
readthedocs.io/en/latest/tradier.html#tradier-extraction-api-reference>`__
"""
import os
import analysis_engine.consts as ae_consts
import analysis_engine.utils as ae_utils
import analysis_engine.build_dataset_node as build_dataset_node
import analysis_engine.api_requests as api_requests
import spylunking.log.setup_logging as log_utils
log = log_utils.build_colorized_logger(name=__name__)
[docs]def extract(
ticker=None,
datasets=None,
tickers=None,
use_key=None,
extract_mode='all',
iex_datasets=None,
date=None,
redis_enabled=True,
redis_address=None,
redis_db=None,
redis_password=None,
redis_expire=None,
s3_enabled=True,
s3_address=None,
s3_bucket=None,
s3_access_key=None,
s3_secret_key=None,
s3_region_name=None,
s3_secure=False,
celery_disabled=True,
broker_url=None,
result_backend=None,
label=None,
verbose=False):
"""extract
Extract all cached datasets for a stock ``ticker`` or
a list of ``tickers`` and returns a dictionary. Please
make sure the datasets are already cached in Redis
before running this method. If not please refer to
the ``analysis_engine.fetch.fetch`` function
to prepare the datasets on your environment.
Python example:
.. code-block:: python
from analysis_engine.extract import extract
d = extract(ticker='NFLX')
print(d)
for k in d['NFLX']:
print(f'dataset key: {k}')
**Extract Intraday Stock and Options Minute Pricing Data**
This works by using the ``date`` and ``datasets`` arguments
as filters:
.. code-block:: python
import analysis_engine.extract as ae_extract
print(ae_extract.extract(
ticker='SPY',
datasets=['minute', 'tdcalls', 'tdputs'])
This was created for reducing the amount of typying in
Jupyter notebooks. It can be set up for use with a
distributed engine as well with the optional arguments
depending on your connectitivty requirements.
.. note:: Please ensure Redis and Minio are running
before trying to extract tickers
**Stock tickers to extract**
:param ticker: single stock ticker/symbol/ETF to extract
:param tickers: optional - list of tickers to extract
:param use_key: optional - extract historical key from Redis
usually formatted ``<TICKER>_<date formatted YYYY-MM-DD>``
**(Optional) Data sources, datafeeds and datasets to gather**
:param iex_datasets: list of strings for gathering specific `IEX
datasets <https://iexcloud.io/>`__
which are set as consts: ``analysis_engine.iex.consts.FETCH_*``.
:param date: optional - string date formatted
``YYYY-MM-DD`` - if not set use last close date
:param datasets: list of strings for indicator
dataset extraction - preferred method
(defaults to ``BACKUP_DATASETS``)
**(Optional) Redis connectivity arguments**
:param redis_enabled: bool - toggle for auto-caching all
datasets in Redis
(default is ``True``)
:param redis_address: Redis connection string format: ``host:port``
(default is ``localhost:6379``)
:param redis_db: Redis db to use
(default is ``0``)
:param redis_password: optional - Redis password
(default is ``None``)
:param redis_expire: optional - Redis expire value
(default is ``None``)
**(Optional) Minio (S3) connectivity arguments**
:param s3_enabled: bool - toggle for auto-archiving on Minio (S3)
(default is ``True``)
:param s3_address: Minio S3 connection string format: ``host:port``
(default is ``localhost:9000``)
:param s3_bucket: S3 Bucket for storing the artifacts
(default is ``dev``) which should be viewable on a browser:
http://localhost:9000/minio/dev/
:param s3_access_key: S3 Access key
(default is ``trexaccesskey``)
:param s3_secret_key: S3 Secret key
(default is ``trex123321``)
:param s3_region_name: S3 region name
(default is ``us-east-1``)
:param s3_secure: Transmit using tls encryption
(default is ``False``)
**(Optional) Celery worker broker connectivity arguments**
:param celery_disabled: bool - toggle synchronous mode or publish
to an engine connected to the `Celery broker and backend
<https://github.com/celery/celery#transports-and-backends>`__
(default is ``True`` - synchronous mode without an engine
or need for a broker or backend for Celery)
:param broker_url: Celery broker url
(default is ``redis://0.0.0.0:6379/13``)
:param result_backend: Celery backend url
(default is ``redis://0.0.0.0:6379/14``)
:param label: tracking log label
**(Optional) Debugging**
:param verbose: bool - show extract warnings
and other debug logging (default is False)
**Supported environment variables**
::
export REDIS_ADDRESS="localhost:6379"
export REDIS_DB="0"
export S3_ADDRESS="localhost:9000"
export S3_BUCKET="dev"
export AWS_ACCESS_KEY_ID="trexaccesskey"
export AWS_SECRET_ACCESS_KEY="trex123321"
export AWS_DEFAULT_REGION="us-east-1"
export S3_SECURE="0"
export WORKER_BROKER_URL="redis://0.0.0.0:6379/13"
export WORKER_BACKEND_URL="redis://0.0.0.0:6379/14"
"""
rec = {}
extract_requests = []
use_tickers = tickers
if ticker:
use_tickers = [ticker]
else:
if not use_tickers:
use_tickers = []
default_iex_datasets = [
'daily',
'minute',
'quote',
'stats',
'peers',
'news',
'financials',
'earnings',
'dividends',
'company'
]
if not iex_datasets:
iex_datasets = default_iex_datasets
use_indicator_datasets = datasets
if not use_indicator_datasets:
use_indicator_datasets = ae_consts.BACKUP_DATASETS
if redis_enabled:
if not redis_address:
redis_address = os.getenv(
'REDIS_ADDRESS',
'localhost:6379')
if not redis_password:
redis_password = os.getenv(
'REDIS_PASSWORD',
None)
if not redis_db:
redis_db = int(os.getenv(
'REDIS_DB',
'0'))
if not redis_expire:
redis_expire = os.getenv(
'REDIS_EXPIRE',
None)
if s3_enabled:
if not s3_address:
s3_address = os.getenv(
'S3_ADDRESS',
'localhost:9000')
if not s3_access_key:
s3_access_key = os.getenv(
'AWS_ACCESS_KEY_ID',
'trexaccesskey')
if not s3_secret_key:
s3_secret_key = os.getenv(
'AWS_SECRET_ACCESS_KEY',
'trex123321')
if not s3_region_name:
s3_region_name = os.getenv(
'AWS_DEFAULT_REGION',
'us-east-1')
if not s3_secure:
s3_secure = os.getenv(
'S3_SECURE',
'0') == '1'
if not s3_bucket:
s3_bucket = os.getenv(
'S3_BUCKET',
'dev')
if not broker_url:
broker_url = os.getenv(
'WORKER_BROKER_URL',
'redis://0.0.0.0:6379/13')
if not result_backend:
result_backend = os.getenv(
'WORKER_BACKEND_URL',
'redis://0.0.0.0:6379/14')
if not label:
label = 'get-latest'
last_close_str = ae_utils.get_last_close_str()
use_date_str = last_close_str
if date:
use_date_str = date
ticker_key = use_key
if use_key:
ticker_key = f'{ticker}_{last_close_str}'
else:
ticker_key = f'{ticker}_{use_date_str}'
common_vals = {}
common_vals['base_key'] = ticker_key
common_vals['celery_disabled'] = celery_disabled
common_vals['ticker'] = ticker
common_vals['label'] = label
common_vals['iex_datasets'] = iex_datasets
common_vals['s3_enabled'] = s3_enabled
common_vals['s3_bucket'] = s3_bucket
common_vals['s3_address'] = s3_address
common_vals['s3_secure'] = s3_secure
common_vals['s3_region_name'] = s3_region_name
common_vals['s3_access_key'] = s3_access_key
common_vals['s3_secret_key'] = s3_secret_key
common_vals['s3_key'] = ticker_key
common_vals['redis_enabled'] = redis_enabled
common_vals['redis_address'] = redis_address
common_vals['redis_password'] = redis_password
common_vals['redis_db'] = redis_db
common_vals['redis_key'] = ticker_key
common_vals['redis_expire'] = redis_expire
common_vals['redis_address'] = redis_address
common_vals['s3_address'] = s3_address
if verbose:
log.info(
f'{label} - extract ticker={ticker} last_close={last_close_str} '
f'base_key={common_vals["base_key"]} '
f'redis_address={common_vals["redis_address"]} '
f's3_address={common_vals["s3_address"]}')
"""
Extract Supported Datasets
"""
for ticker in use_tickers:
req = api_requests.get_ds_dict(
ticker=ticker,
base_key=common_vals['base_key'],
ds_id=label)
extract_requests.append(req)
# end of for all ticker in use_tickers
for extract_req in extract_requests:
ticker_data = build_dataset_node.build_dataset_node(
ticker=ticker,
date=use_date_str,
datasets=use_indicator_datasets,
verbose=verbose)
rec[ticker] = ticker_data
# end of for service_dict in extract_requests
return rec
# end of extract