Posted on Aug 03, 2021

Implemented an automated solution for resource configuration, deployment, and scheduling


  • Needed consultation for evaluation of tools and approaches for cloud adaptation. The objective was to offload computing from existing out-moded on-premise MapR cluster to the cloud.
  • Needed a solution custom-built for their live data (largest module) for evaluation and decision-making.
  • Needed an automated solution for resource configuration, deployment, scheduling, scalability, etc.
  • Needed the ability to process incoming incremental data (10 TB or more) in a better and more efficient manner.


  • Provided a cloud-optimized, on-demand spin up solution for the computation offloading and Snowflake-based reporting solution.
  • Weekly extraction of 5TB or more data performed from the on premise MapR cluster and placed in S3 using shell script & AWS CLI executed by Airflow jobs.
  • Based on data size, copied over AWS EMR cluster is spun up using cloud formation templates and AWS CLI for executing Spark & Pig scripts.
  • Resultant data post-processing from EMR is pushed into S3 buckets for persistence.
  • AWS EMR cluster is auto-scaling enabled and gets purged post-processing.