Extraction Utility Helper - Perform Extraction from Redis or S3

Dataset Extraction Utilities

Helper for extracting a dataset from Redis or S3 and load it into a pandas.DataFrame. This was designed to ignore the source of the dataset (IEX vs Yahoo) and perform the extract and load operations without knowledge of the underlying dataset.

Supported environment variables:

# verbose logging in this module
export DEBUG_EXTRACT=1

# verbose logging for just Redis operations in this module
export DEBUG_REDIS_EXTRACT=1

# verbose logging for just S3 operations in this module
export DEBUG_S3_EXTRACT=1

# to show debug, trace logging please export ``SHARED_LOG_CFG``
# to a debug logger json file. To turn on debugging for this
# library, you can export this variable to the repo's
# included file with the command:
export SHARED_LOG_CFG=/opt/sa/analysis_engine/log/debug-logging.json
analysis_engine.extract_utils.perform_extract(df_type, df_str, work_dict, dataset_id_key='ticker', scrub_mode='sort-by-date', verbose=False)[source]

Helper for extracting from Redis or S3

Parameters:
  • df_type – datafeed type enum
  • ds_str – dataset string name
  • work_dict – incoming work request dictionary
  • dataset_id_key – configurable dataset identifier key for tracking scrubbing and debugging errors
  • scrub_mode – scrubbing mode on extraction for one-off cleanup before analysis
  • verbose – optional - boolean for turning on logging