Apache Spark Cost Optimization

Apache Spark/Databricks is often utilized for processing large volumes of data, working on complex data science pipelines, and other large scheduled jobs. Optimization can save you $MM per year.

Five Signs Your Company Has a Cloud​ Spending Problem

  • No one knows all the public cloud accounts the company has open.
  • No one can fully explain the company’s monthly cloud bills.
  • There is significant spend no one can trace to approved initiatives.
  • All compute capacity is being purchased at on-demand prices.
  • No one is regularly reviewing consumption efficiency.

Cost Savings through Workload Optimization

Significant Savings Quickly

  • Workload optimization can be done within weeks.
  • Once optimized workload is deployed it starts producing savings immediately. 

Minimal Changes

  • No organizational changes.
  • No architectural changes.
  • Maybe minimal infrastructure changes.
  • Maybe minimal code changes.
  • Likely minimal deployment changes.

Sustainable Benefits

  • Optimized workload can produce savings indefinitely.
  • Lessons Learned ensure Best Practices can be utilized in the newly developed future workloads producing further savings.
  • Performance improvements result in a better SLA adherence and frees compute capacity for additional processing. 
  • Second opinion on the existing architecture and best practices.

Delivery Methodology



Outcome Based: No Upfront Costs!



Case Study

Problem: ​
Spark pipeline process scheduled on an hourly basis takes 50 min on 80 r4.8xlarge instances.​
Business needs more extensive functionality in this process which will put run time to over 60 min and break SLA.​

After optimization:​
- 20 min on 60 r4.4xlarge instances providing extra 40 min within SLA.​
- Cost reduction OVER $1M a year! ​Interested in working together? 
    # of instances min/hr EC2 instance/hr Hourly Daily Monthly Yearly
Before r4.8xlarge 80 50 $2.1280 $142 $3,405 $102,144 $1,225,728
After r4.4xlarge 60 20 $1.0640 $21 $511 $15,322 $183,859

