Extraction Utility Helper - Perform Extraction from Redis or S3¶

Dataset Extraction Utilities

Helper for extracting a dataset from Redis or S3 and load it into a pandas.DataFrame. This was designed to ignore the source of the dataset (IEX vs Yahoo) and perform the extract and load operations without knowledge of the underlying dataset.

Supported environment variables:

# verbose logging in this module
export DEBUG_EXTRACT=1

# verbose logging for just Redis operations in this module
export DEBUG_REDIS_EXTRACT=1

# verbose logging for just S3 operations in this module
export DEBUG_S3_EXTRACT=1

# to show debug, trace logging please export ``SHARED_LOG_CFG``
# to a debug logger json file. To turn on debugging for this
# library, you can export this variable to the repo's
# included file with the command:
export SHARED_LOG_CFG=/opt/sa/analysis_engine/log/debug-logging.json

analysis_engine.extract_utils.perform_extract(df_type, df_str, work_dict, dataset_id_key='ticker', scrub_mode='sort-by-date', verbose=False)[source]¶

Helper for extracting from Redis or S3

Parameters:	df_type – datafeed type enum ds_str – dataset string name work_dict – incoming work request dictionary dataset_id_key – configurable dataset identifier key for tracking scrubbing and debugging errors scrub_mode – scrubbing mode on extraction for one-off cleanup before analysis verbose – optional - boolean for turning on logging