Data engineering career switch

 If you're looking to start a career in data engineering or considering a career switch, you should focus on following key areas -


𝗗𝗮𝘁𝗮 𝗶𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻:

Extract -
• Full extracts
• Incremental extracts
  
Load-
With databases, learn how to implement load patterns such as:
• Insert-only loads
• Insert and update (aka upsert) loads
• Insert, update, and delete (aka merge) loads

With files, learn to use columnar file formats like parquet and load patterns such as-
• Overwrite file
• Append-only to a folder

𝗗𝗮𝘁𝗮 𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
Learn to transform data using DataFrames and SQL

SQL-
• Transforming data in a PostgreSQL database using SQL
• Performing complex aggregations using window functions in SQL
• Learn to decompose your transformation logic using Common Table Expressions (CTEs)
• Learn to perform these transformations on an open-source database like PostgreSQL

DataFrames-
• Transforming data in a CSV file using Pandas
• Transforming data in a Parquet file using Polars
• Learn how to transform data using the classic DataFrame library, Pandas
• Learn operations like joins, aggregations, group by, filters
• Learn to write unit tests using libraries like PyTest to test your transformation logic

𝗗𝗮𝘁𝗮 𝗼𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻:
Learn how to create a Directed Acyclic Graph (DAG) using Python

• Learn how to create a Directed Acyclic Graph (DAG) using Python. Something like the graphlib.TopologicalSorter is enough to get you going.
• Learn how to generate logs to keep track of your code execution using logging
• Learn how to write logs into a database like PostgreSQL and generate alerts when a run fails
• Learn how to schedule your Python DAG using cron expressions

𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁:
• Learn how to use GIT so that your code is stored in a version control system
• Learn how to deploy your ETL pipeline (extract, load, transform, and orchestration) to a cloud service like AWS
• Learn how to dockerize your application so that it can easily be deployed to a cloud service like AWS Elastic Container Service.

Here are some free resources and projects to start your journey -



Mastering these fundamentals will give you the understanding and ability to pick up modern data engineering tools (aka modern data stack) with greater ease.

Congratulations, you now have a solid foundation to start interviewing for data engineer roles!

While there are certainly more advanced topics to explore, the courses and key areas mentioned above will provide you with a strong starting point to showcase your skills and knowledge in interviews.

---------



Comments

Popular posts from this blog

Spark Cluster

DORA Metrics