Extraction Utility Helper - Perform Extraction from Redis or S3¶
Dataset Extraction Utilities
Helper for extracting a dataset from Redis or S3 and
load it into a pandas.DataFrame
. This was designed
to ignore the source of the dataset (IEX vs Yahoo) and
perform the extract and load operations without
knowledge of the underlying dataset.
Supported environment variables:
# verbose logging in this module
export DEBUG_EXTRACT=1
# verbose logging for just Redis operations in this module
export DEBUG_REDIS_EXTRACT=1
# verbose logging for just S3 operations in this module
export DEBUG_S3_EXTRACT=1
# to show debug, trace logging please export ``SHARED_LOG_CFG``
# to a debug logger json file. To turn on debugging for this
# library, you can export this variable to the repo's
# included file with the command:
export SHARED_LOG_CFG=/opt/sa/analysis_engine/log/debug-logging.json
-
analysis_engine.extract_utils.
perform_extract
(df_type, df_str, work_dict, dataset_id_key='ticker', scrub_mode='sort-by-date', verbose=False)[source]¶ Helper for extracting from Redis or S3
Parameters: - df_type – datafeed type enum
- ds_str – dataset string name
- work_dict – incoming work request dictionary
- dataset_id_key – configurable dataset identifier key for tracking scrubbing and debugging errors
- scrub_mode – scrubbing mode on extraction for one-off cleanup before analysis
- verbose – optional - boolean for turning on logging