Workload Management in Modern Data Platform
Workload Management (WLM) configuration for high concurrency engines is a complex task with competing objectives. In this article, I will cover simple concepts that hopefully help shed some light on the workload management. I will be using Dremio for this article, but it could be applied to other high concurrency data processing engines.
Workload is a set of Jobs such as SQL queries. Each Query consists of Tasks that can be grouped into Phases. Some tasks can be processed in parallel.
Velocity is simply a measure of queries per unit of time (usually per second or QPS). There are two types of Velocity:
- Workload Velocity is how many queries per second a cluster receives for processing.
- Processing Velocity is how many queries per second a cluster is processing.
It’s good to note that a workload velocity is usually not constant and has seasonality and randomness. Processing Velocity has an upper bound limit based on the resource capacity and performance available to the cluster.
Cluster Capacity is defined by quantity and performance of its resources such as CPU, memory, network, disks. Cluster capacity can be measured in a number of tasks that the cluster can process concurrently, in parallel.
Concurrency is a level of parallelism of a workload running on a cluster. It is usually measured in a number of queries running in parallel on an engine. However, it can also be measured in the number of concurrent tasks depending on an engine.
Each query in a workload gets placed in the WLM Queue prior to getting executed on the cluster. If the Workload Velocity exceeds Processing Velocity the queue will grow, and the queries will have to wait in the queue until currently running queries finish and release required resources.
WLM for isolating High SLA Workload
Workload Management is often used to isolate workloads with different SLAs. The High SLA Queue on the diagram below allows important queries to bypass General Queue and get processed faster. In addition, Dremio allows to configure CPU priority which will allow High SLA queries to execute faster with getting more CPU time.
WLM Concurrency Configuration
Concurrency is usually configurable and can exceed Cluster Capacity. Let’s assume that an average Query in the Workload has 2 tasks to process. If we run 4 Queries or less simultaneously on an engine with the Cluster Capacity of 8 tasks, they will be processed in parallel.
However, if we try to run 8 Queries with 2 tasks each, we will need to run 16 tasks which will exceed Cluster Capacity of 8. Most of high concurrency clusters will easily run workload significantly exceeding their capacity. It’s usually done with slicing CPU time between tasks allocating time slot for each task as shown in the diagram below.
Workload Management Optimization
Concurrency is often used to increase overall cluster bandwidth in an attempt to process large workloads. Let’s see if it’s a valid approach.
First, let’s configure Concurrency to be equal to the Cluster Capacity and run 8 similar queries each requiring 2 tasks to complete. Queries 1…4 will start processing while Queries 5…8 will be placed in the queue until queries 1…4 are finished. It will take 4 time slots to process all 8 queries.
Now, let’s configure the Concurrency to be a double of the Cluster Capacity. Under this condition, all queries will be scheduled for processing without putting any query in the queue. Interestingly, it will take the same amount of time to process this workload.
Workload Management Math
Let’s have some fun with math and describe expressions to calculate processing velocity, query runtime, and queued time. If math is boring to you, skip to the next section.
We already know that the Workload Velocity is a number of queries per second arriving to the cluster. The Processing Velocity is how fast the cluster can process queries and it can be expressed as a runtime of an average query times Concurrency that we configured on the cluster.
As we have seen in the previous section, Query runtime will be longer if Concurrency exceeds Cluster Capacity. In an ideal world, it’s a linear dependency, so the query runtime will be the runtime of the same query processed on a cluster not restricted with the exceeding Concurrency times Concurrency over the Cluster Capacity.
If we substitute query runtime in the Processing Velocity equation it will reduce to the Cluster Capacity divided by the unrestricted query runtime. Interestingly, it does not depend on concurrency which confirms our conclusion in the previous section.
Total Query Time also needs to include time the query was waiting in the queue. In order to calculate Queue Time, we first need to find Queue size.
Or substituting Processing Velocity:
In reality, to dissolve a queue we need to process all queries waiting in the queue, as such waiting time of a query in a queue will be Queue Size divided by Processing Velocity.
Let’s assume that our Workload Velocity looks like the blue curve and the Cluster Capacity is an orange curve on the graph below:
Under these conditions, Processing Velocity (blue curve on the graph below) will grow until reaching Cluster Capacity of 20 and then stay flat restricted by the Cluster Capacity. However, since the Workload Velocity is higher than the Cluster Capacity, the Queue Size (orange curve) will keep growing until workload goes down to the Cluster Capacity level, then the Queue Size will stay flat while the Cluster Capacity and the Workload Velocity stay the same, and then it will go down with the Workload Velocity going down.
And the total query time will look like this:
Real Life Clusters
We did an experiment. We ran a batch of 1,000 various queries with constant concurrency (not velocity) of 150. We changed WLM Concurrency configuration on the cluster from 5 to 150 with each batch execution.
We discovered that the Total Batch Runtime did not change until a saturation point around WLM Concurrency of 130-150. At this point it started increasing exponentially.
Average Query Duration (processing time plus queued time) nicely reflected a patter of the Total Batch Runtime.
Interestingly, the average query runtime and average query queued time cancelled each other until the saturation point.
There are several key points to conclude:
- The Cluster Capacity is a limiting factor.
- Concurrency configuration impacts query running time and query queued time in an opposite way.
- Concurrency does not impact total query processing time or overall Processing Velocity.
- With higher Concurrency or level of parallelization cluster will have increased overhead and require more resources to keep data fragments, job context, etc.
- Higher concurrency will have different impact on individual jobs due to various demand for various resources between the jobs.
- It is a good practice to use WLM for isolating High SLA workloads.
- To execute normal workload and High SLA workload on a single engine, you might want to configure relatively small WLM Concurrency for the normal workload to allow for faster processing of the High SLA queries. Please refer to the Average Query Runtime diagram above in the article.
- Finally, increasing WLM Concurrency will not help with processing workloads larger than the Cluster Capacity is capable of processing.