Posted on Aug 03, 2021
Implemented an automated solution for resource configuration, deployment, and scheduling
- Needed consultation for evaluation of tools and approaches for cloud adaptation. The objective was to offload computing from existing out-moded on-premise MapR cluster to the cloud.
- Needed a solution custom-built for their live data (largest module) for evaluation and decision-making.
- Needed an automated solution for resource configuration, deployment, scheduling, scalability, etc.
- Needed the ability to process incoming incremental data (10 TB or more) in a better and more efficient manner.
- Provided a cloud-optimized, on-demand spin up solution for the computation offloading and Snowflake-based reporting solution.
- Weekly extraction of 5TB or more data performed from the on premise MapR cluster and placed in S3 using shell script & AWS CLI executed by Airflow jobs.
- Based on data size, copied over AWS EMR cluster is spun up using cloud formation templates and AWS CLI for executing Spark & Pig scripts.
- Resultant data post-processing from EMR is pushed into S3 buckets for persistence.
- AWS EMR cluster is auto-scaling enabled and gets purged post-processing.