Source code for analysis_engine.extract

"""
**Extraction API Examples**

**Extract All Data for a Ticker**

.. code-block:: python

    import analysis_engine.extract as ae_extract
    print(ae_extract.extract('SPY'))

**Extract Latest Minute Pricing for Stocks and Options**

.. code-block:: python

    import analysis_engine.extract as ae_extract
    print(ae_extract.extract(
        'SPY',
        datasets=['minute', 'tdcalls', 'tdputs']))

**Extract Historical Data**

Extract historical data with the ``date`` argument formatted ``YYYY-MM-DD``:

.. code-block:: python

    import analysis_engine.extract as ae_extract
    print(ae_extract.extract(
        'AAPL',
        datasets=['minute', 'daily', 'financials', 'earnings', 'dividends'],
        date='2019-02-15'))

**Additional Extraction APIs**

`IEX Cloud Extraction API Reference <https://stock-analysis-engine.
readthedocs.io/en/latest/iex_api.html#iex-extraction-api-reference>`__

`Tradier Extraction API Reference <https://stock-analysis-engine.
readthedocs.io/en/latest/tradier.html#tradier-extraction-api-reference>`__

"""

import os
import analysis_engine.consts as ae_consts
import analysis_engine.utils as ae_utils
import analysis_engine.build_dataset_node as build_dataset_node
import analysis_engine.api_requests as api_requests
import spylunking.log.setup_logging as log_utils

log = log_utils.build_colorized_logger(name=__name__)


[docs]def extract( ticker=None, datasets=None, tickers=None, use_key=None, extract_mode='all', iex_datasets=None, date=None, redis_enabled=True, redis_address=None, redis_db=None, redis_password=None, redis_expire=None, s3_enabled=True, s3_address=None, s3_bucket=None, s3_access_key=None, s3_secret_key=None, s3_region_name=None, s3_secure=False, celery_disabled=True, broker_url=None, result_backend=None, label=None, verbose=False): """extract Extract all cached datasets for a stock ``ticker`` or a list of ``tickers`` and returns a dictionary. Please make sure the datasets are already cached in Redis before running this method. If not please refer to the ``analysis_engine.fetch.fetch`` function to prepare the datasets on your environment. Python example: .. code-block:: python from analysis_engine.extract import extract d = extract(ticker='NFLX') print(d) for k in d['NFLX']: print(f'dataset key: {k}') **Extract Intraday Stock and Options Minute Pricing Data** This works by using the ``date`` and ``datasets`` arguments as filters: .. code-block:: python import analysis_engine.extract as ae_extract print(ae_extract.extract( ticker='SPY', datasets=['minute', 'tdcalls', 'tdputs']) This was created for reducing the amount of typying in Jupyter notebooks. It can be set up for use with a distributed engine as well with the optional arguments depending on your connectitivty requirements. .. note:: Please ensure Redis and Minio are running before trying to extract tickers **Stock tickers to extract** :param ticker: single stock ticker/symbol/ETF to extract :param tickers: optional - list of tickers to extract :param use_key: optional - extract historical key from Redis usually formatted ``<TICKER>_<date formatted YYYY-MM-DD>`` **(Optional) Data sources, datafeeds and datasets to gather** :param iex_datasets: list of strings for gathering specific `IEX datasets <https://iexcloud.io/>`__ which are set as consts: ``analysis_engine.iex.consts.FETCH_*``. :param date: optional - string date formatted ``YYYY-MM-DD`` - if not set use last close date :param datasets: list of strings for indicator dataset extraction - preferred method (defaults to ``BACKUP_DATASETS``) **(Optional) Redis connectivity arguments** :param redis_enabled: bool - toggle for auto-caching all datasets in Redis (default is ``True``) :param redis_address: Redis connection string format: ``host:port`` (default is ``localhost:6379``) :param redis_db: Redis db to use (default is ``0``) :param redis_password: optional - Redis password (default is ``None``) :param redis_expire: optional - Redis expire value (default is ``None``) **(Optional) Minio (S3) connectivity arguments** :param s3_enabled: bool - toggle for auto-archiving on Minio (S3) (default is ``True``) :param s3_address: Minio S3 connection string format: ``host:port`` (default is ``localhost:9000``) :param s3_bucket: S3 Bucket for storing the artifacts (default is ``dev``) which should be viewable on a browser: http://localhost:9000/minio/dev/ :param s3_access_key: S3 Access key (default is ``trexaccesskey``) :param s3_secret_key: S3 Secret key (default is ``trex123321``) :param s3_region_name: S3 region name (default is ``us-east-1``) :param s3_secure: Transmit using tls encryption (default is ``False``) **(Optional) Celery worker broker connectivity arguments** :param celery_disabled: bool - toggle synchronous mode or publish to an engine connected to the `Celery broker and backend <https://github.com/celery/celery#transports-and-backends>`__ (default is ``True`` - synchronous mode without an engine or need for a broker or backend for Celery) :param broker_url: Celery broker url (default is ``redis://0.0.0.0:6379/13``) :param result_backend: Celery backend url (default is ``redis://0.0.0.0:6379/14``) :param label: tracking log label **(Optional) Debugging** :param verbose: bool - show extract warnings and other debug logging (default is False) **Supported environment variables** :: export REDIS_ADDRESS="localhost:6379" export REDIS_DB="0" export S3_ADDRESS="localhost:9000" export S3_BUCKET="dev" export AWS_ACCESS_KEY_ID="trexaccesskey" export AWS_SECRET_ACCESS_KEY="trex123321" export AWS_DEFAULT_REGION="us-east-1" export S3_SECURE="0" export WORKER_BROKER_URL="redis://0.0.0.0:6379/13" export WORKER_BACKEND_URL="redis://0.0.0.0:6379/14" """ rec = {} extract_requests = [] use_tickers = tickers if ticker: use_tickers = [ticker] else: if not use_tickers: use_tickers = [] default_iex_datasets = [ 'daily', 'minute', 'quote', 'stats', 'peers', 'news', 'financials', 'earnings', 'dividends', 'company' ] if not iex_datasets: iex_datasets = default_iex_datasets use_indicator_datasets = datasets if not use_indicator_datasets: use_indicator_datasets = ae_consts.BACKUP_DATASETS if redis_enabled: if not redis_address: redis_address = os.getenv( 'REDIS_ADDRESS', 'localhost:6379') if not redis_password: redis_password = os.getenv( 'REDIS_PASSWORD', None) if not redis_db: redis_db = int(os.getenv( 'REDIS_DB', '0')) if not redis_expire: redis_expire = os.getenv( 'REDIS_EXPIRE', None) if s3_enabled: if not s3_address: s3_address = os.getenv( 'S3_ADDRESS', 'localhost:9000') if not s3_access_key: s3_access_key = os.getenv( 'AWS_ACCESS_KEY_ID', 'trexaccesskey') if not s3_secret_key: s3_secret_key = os.getenv( 'AWS_SECRET_ACCESS_KEY', 'trex123321') if not s3_region_name: s3_region_name = os.getenv( 'AWS_DEFAULT_REGION', 'us-east-1') if not s3_secure: s3_secure = os.getenv( 'S3_SECURE', '0') == '1' if not s3_bucket: s3_bucket = os.getenv( 'S3_BUCKET', 'dev') if not broker_url: broker_url = os.getenv( 'WORKER_BROKER_URL', 'redis://0.0.0.0:6379/13') if not result_backend: result_backend = os.getenv( 'WORKER_BACKEND_URL', 'redis://0.0.0.0:6379/14') if not label: label = 'get-latest' last_close_str = ae_utils.get_last_close_str() use_date_str = last_close_str if date: use_date_str = date ticker_key = use_key if use_key: ticker_key = f'{ticker}_{last_close_str}' else: ticker_key = f'{ticker}_{use_date_str}' common_vals = {} common_vals['base_key'] = ticker_key common_vals['celery_disabled'] = celery_disabled common_vals['ticker'] = ticker common_vals['label'] = label common_vals['iex_datasets'] = iex_datasets common_vals['s3_enabled'] = s3_enabled common_vals['s3_bucket'] = s3_bucket common_vals['s3_address'] = s3_address common_vals['s3_secure'] = s3_secure common_vals['s3_region_name'] = s3_region_name common_vals['s3_access_key'] = s3_access_key common_vals['s3_secret_key'] = s3_secret_key common_vals['s3_key'] = ticker_key common_vals['redis_enabled'] = redis_enabled common_vals['redis_address'] = redis_address common_vals['redis_password'] = redis_password common_vals['redis_db'] = redis_db common_vals['redis_key'] = ticker_key common_vals['redis_expire'] = redis_expire common_vals['redis_address'] = redis_address common_vals['s3_address'] = s3_address if verbose: log.info( f'{label} - extract ticker={ticker} last_close={last_close_str} ' f'base_key={common_vals["base_key"]} ' f'redis_address={common_vals["redis_address"]} ' f's3_address={common_vals["s3_address"]}') """ Extract Supported Datasets """ for ticker in use_tickers: req = api_requests.get_ds_dict( ticker=ticker, base_key=common_vals['base_key'], ds_id=label) extract_requests.append(req) # end of for all ticker in use_tickers for extract_req in extract_requests: ticker_data = build_dataset_node.build_dataset_node( ticker=ticker, date=use_date_str, datasets=use_indicator_datasets, verbose=verbose) rec[ticker] = ticker_data # end of for service_dict in extract_requests return rec
# end of extract