Brief Summary
This video provides a comprehensive guide on how to become an Azure Data Engineer, covering essential skills, learning resources, project ideas, and interview preparation tips. It emphasises the growing demand for data engineers and the lucrative career opportunities in the field.
- SQL, Python, Data Warehousing, and Azure Cloud skills are crucial.
- Hands-on projects and practical experience on the Azure portal are essential.
- Certification (DP-203) can significantly boost job prospects.
Why Data Engineering?
The role of a data engineer has emerged due to the increasing volume of data, consolidating responsibilities previously held by various specialists. The demand for data engineers has surged, with job openings increasing by 300% in recent years. Experienced professionals in this field can earn between £20,000 to £70,000, and salaries are expected to rise further as companies rely on data-driven decision-making.
About Me
Shubham Wadekar shares his journey into data engineering, starting in 2021. He highlights the impact of sharing his knowledge on LinkedIn, where he gained a significant following. Now, he aims to provide valuable content on YouTube to help others transition into data engineering.
Roles in Data Engineering
There are two primary roles in data engineering: Big Data Developer and Cloud Data Engineer. Big Data Developers focus on technologies like Hadoop, Spark, and Kafka. Cloud Data Engineers concentrate on cloud platforms such as AWS, Azure, and GCP. The video focuses on Azure Data Engineering due to the increasing migration of on-premises data to the cloud and the high demand for professionals skilled in cloud technologies, particularly Azure.
How you can become a Azure Data Engineer
To become an Azure Data Engineer, a PDF resource is provided with a list of skills and resources. The link to this PDF is in the video description.
SQL
SQL is a foundational skill for data engineering. To master SQL, start with the basics using the "SQL Tutorial for Everyone" playlist by Sumit Sir. Then, learn advanced SQL concepts from Sumit Sir's advanced playlist. Practice SQL questions from the Anit Buel YouTube channel, which includes a playlist of complex SQL questions for interview preparation. Finally, practice with real datasets using the provided list of 20 platforms with over 1,000 questions.
Python
Python is the preferred programming language for data engineering due to its extensive libraries and packages. Start by learning Python fundamentals from Manish Kumar's playlist, which contains 30 videos. Then, learn data manipulation libraries such as Pandas, NumPy, and Matplotlib, covered by the Code Basics YouTube channel.
Data Warehousing and Data Modeling
Data warehousing involves storing a company's data in an organised way to facilitate better decision-making by analysing past trends. Data modelling is creating a blueprint for how data will be stored and organised in a database or data warehouse. To learn data warehousing and modelling, go through the 20-video series by Manish Kumar and five articles on cracking data modelling interviews. Key topics include entities, attributes, primary and foreign keys, normalisation, denormalisation, star and snowflake schemas, data relationships, indexing, slowly changing dimensions, and fact and dimension tables.
Linux
A video is recommended that covers all the Linux commands required for data engineering.
Pyspark
PySpark is a crucial big data technology that combines SQL and Python, enabling efficient processing of large datasets. While SQL works well for small to medium datasets, PySpark distributes computations across multiple machines, making it faster for big data processing. Use the provided playlist to understand the fundamentals of Spark, and go through a 40-video playlist for interview preparation.
Azure Cloud
Focus on specific Azure services relevant to data engineering: Azure Data Factory (ADF) for ETL, Azure Databricks for large-scale data processing, Azure Synapse Analytics for data warehousing, Azure Data Lake Storage (ADLS) or Azure Blob Storage for scalable storage, Azure Stream Analytics for real-time analytics, Azure Cosmos DB as a NoSQL database, and HDInsight for managed big data analytics. Additionally, learning Azure Logic Apps, Azure Functions, and Microsoft Fabric is beneficial. For resources, use the Geek Coder playlist for ADF, a Udemy course for Databricks, a playlist for Synapse Analytics, and a Udemy course for the remaining services.
GIT/GITLAB/CICD
Git, GitLab, and CI/CD are essential for managing code, tracking changes, and ensuring smooth production runs. For a better understanding of Git and CI/CD, go through two YouTube videos: "Complete Git and GitHub Tutorial for Beginners" and "GitLab CI/CD Tutorial for Beginners".
Projects
After completing the learning phase, focus on building end-to-end projects to combine various services. A list of three projects available on YouTube is provided, which will give you an understanding of services like Data Factory, Databricks, Synapse Analytics and ADLS, along with SQL, Python, and PySpark.
Practice on Azure Portal
Create a free account on the Azure portal to gain practical experience. Azure provides £200 credit for free and a 12-month free trial on some services. Access the Azure Cloud portal to start understanding different services and their integration.
Interview Preparation
A product containing data engineering interview experiences for over 100 companies is available. This resource includes detailed information on interview rounds (screening, technical, system design, behavioural, HR) and types of questions asked (theoretical, scenario-based, coding, system design). The link to purchase this product is in the description section.
Certification
After completing the learnings and projects, consider obtaining the Azure Data Engineering Associate (DP-203) certification from Microsoft to validate your knowledge and stand out when applying for jobs. This certification increases the chances of getting interview calls.