Kestra GCP Batch Task Runners is a powerful Kestra feature for running batch jobs on Google Cloud Platform (GCP). It provides a scalable and efficient way to run processes that can process large amounts of data, and they can be used to run a variety of tasks, such as data processing, machine learning, and ELT (extract, load, transform). If you have not heard of Kestra, I recommend checking out my blogpost introducing Kestra here. In a quick summary, Kestra is an up and coming orchestration software that lets you truly scale infinitely with their latest task runner feature. In this blog post, I’ll introduce the Kestra GCP Batch Task Runner. I’ll also provide a basic usage example of Kestra GCP Batch Task Runner.
Let’s dive in!
What is the GCP Batch API?
The Google Cloud Batch API is a service that enables you to run large-scale batch jobs on Google Cloud’s infrastructure. It gives you the ability to specify any Compute Engine instance, and even custom instances. A use case that we will dive deeper into is that you can use running Docker Containers in images and easily execute the container in the virtual machine (VM).
What is the Kestra GCP Batch Task Runner?
The Kestra GCP Batch Task Runner uses the GCP Batch API and executes tasks on VMs as containers. Kestra handles the orchestrating of these VMs, which includes:
- Creating the VM
- Running the VM
- Polling/Checking Status of the VM
- Deleting the VM when completed
Our Kestra GCP Task Runner Scenario
We are going to have a small scenario to showcase the power of the Kestra GCP Batch Task Runner feature. This section will assume that the reader has a running instance of Kestra. If you do not, and need to set one up, checkout the documentation located here.
We are going to use the following Kestra workflow:
id: taskRunnerExample
namespace: integrations
description: |
This flow is a task runners example
tasks:
- id: hello
type: io.kestra.plugin.scripts.python.Script
script: |
from time import sleep
def main():
print(“STARTING”)
sleep(60)
print(“ENDING”)
if __name__ == “__main__”:
main()
taskRunner:
type: io.kestra.plugin.ee.gcp.runner.Batch
projectId: "{{ secret('GCP_PROJECT_ID') }}"
region: us-central1
bucket: "{{ secret('GCS_LANDING_BUCKET') }}"
waitUntilCompletion: 86400
machineType: "e2-medium"
serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"
computeResource:
cpu: "2000"
memory: "4096"
As you can see in this workflow, we have our basic Python script in there that sleeps for 60 seconds, and then exits. It runs through the Kestra GCP Batch Task Runner, and uses the e2-medium machine; more specifically 2 CPU and 4 GB of memory.
If you want to expand on this example, you could add a Parallel Kestra plugin node, and then execute multiple Task Runner Python scripts in parallel.
Conclusion
This was a high level overview of Kestra GCP Batch Task Runners. In this blog post, we discussed the basics of the GCP Batch API, the basics of Kestra GCP Batch Task Runner, and an example of the Kestra GCP Batch Task Runner being used with a Python script. These Task Runners are truly incredible, and we are able to run many process in parallel easily with the Parallel plugin node and the Batch Task Runner.
I highly encourage you to explore more with these task runners and discover the capabilities of them.
In a future blog post, I will give a more detailed/relatable example fo GCP Batch Task Runners in an ELT setting.
Thank you and happy coding!