Efficient and effective data pipelines need a powerful Orchestrator! These orchestrators can range from Apache AirFlow to Dagster to Spotify’s Luigi and more. There is a hidden gem on the rise though that is bound to compete competitively with AirFlow for the coming years, introducing… Prefect!
What is Prefect?
Prefect can be divided into 2 different products:
- Prefect Open Source
- Prefect Cloud
Prefect Open Source
Prefect is an Open Source software, which means that it is software where the source code is publicly available and anyone can view it, make changes to it, and improve it. The Prefect Open Source product is the ability to use the Open Source software yourself 100% in your own infrastructure.
Prefect Cloud
Prefect Cloud is similar to Prefect Open Source in that it uses the Prefect Open Source software, but approaches infrastructure with a hybrid lens. This means that you still deploy the orchestration components yourself (Flows, agents, blocks), but instead of connecting back to your systems, connect back to Prefect Cloud where the UI is available and managed by Prefect. On top of this benefit of a managed UI, you also are able to easily create blocks, deployments, notifications (for example: Slack), and other components within Prefect. Additionally, you are able to purchase separate workspaces, which act as environments for all of your Prefect entities.
What are Flows and Tasks?
Flows and tasks are explained in depth in the Prefect documentation, but a good comparison that they make is that they are similar to functions. Which means they follow the function Flow:
- Take argument(s)
- Execute some sort of work function
- Return an output, if applicable
Tasks are different in that they are called by Flows, and tend to include a smaller amount of work. They also have many useful features, such as the ability to retry based on a fail, and also are able to be executed concurrently. Another thing worth noting is that Flows can call other Flows (sub-Flows), or can call tasks.
What are Deployments and Agents?
Agents are the actual orchestrators of Prefect. Specifically they are responsible for picking up work from a specified work queue and executes all flows that come through those queues based on concurrency limits.
Deployments are a confusing topic, but my understanding of them is that they are what allows the developer to schedule flows and execute them through the CLI, API, and Cloud UI. Basically it does not consider the actual infrastructure blocks, or flows, but instead stores the metadata about the flows code, such as where it is located and when to run it. Think of a “Deployment” as a YAML file, Prefect documentation covers this very well here.
Conclusion
In this blogpost, I introduced the rapidly growing and powerful Prefect Orchestration Open Source Software. I covered the basics concepts of Flows, Tasks, Agents, and Deployments. This only covers some of Prefect, but in my next blog post, I will cover Infrastructure and Storage blocks, with an example of a Hybrid Infrastructure with AWS ECS Farg3ate and Prefect Cloud UI.
Happy Coding!
Jack P.
Data engineer at Digible, Inc.