TLDR;
Alright fam, this is a comprehensive 7-hour course on Python beyond the fundamentals, tailored for data professionals. It covers essential topics like OOP, inheritance, polymorphism, decorators, multithreading, schema validation, async Python, API interactions, testing with PyTest, and the OS module. The course uses doodle graphics for easy understanding and is structured chapter-wise for step-by-step learning.
- OOP concepts, classes, and objects
- Inheritance, polymorphism, and decorators
- Multithreading and asynchronous Python
- API interactions and PyTest
- OS module for data loading
Introduction [0:00]
Python fundamentals are just the beginning. To land a job in the data domain, you need to know more than just loops and functions. This course covers advanced concepts like classes, objects, encapsulation, inheritance, polymorphism, decorators, multithreading, schema validation, asynchronous Python, API interactions, PyTest, and the OS module. The course is structured in chapters and uses doodle graphics for easy understanding.
Pre-Requisites [2:35]
To continue with this course, you should have fundamental knowledge of Python, including how to write if-else conditions, for loops, and functions. The course agenda includes OOP concepts, classes, objects, decorators, inheritance, the OS module, polymorphism, multithreading, asynchronous Python, APIs, PyTest, JWT, query parameters, and gather core routines.
OOP Python Overview [6:54]
OOP with Python is essential for building production-level projects. OOP promotes code reusability, scalability, and readability. While functions and variables are building blocks for reusability, OOP allows you to create bundles (classes) of related functions and variables, making code more manageable. Classes offer scalability and readability, allowing you to add functions without disturbing other parts of the code.
Classes & Objects [26:17]
To create a class, use the class keyword followed by the class name. Inside the class, bundle related functions (methods) and variables (attributes). Methods always include self as a parameter. To use a class, create an object of that class. This allows you to reuse the code multiple times.
Constructors [35:06]
Constructors are used when you don't want to use the default values provided by the class owner. To use constructors, add a special function __init__(self, ...). This function is automatically triggered when an object of the class is created. Inside the constructor, make the variables available throughout the class by creating self variables.
Encapsulation - Access Modifiers [49:29]
Encapsulation is achieved in Python using access modifiers: public, private, and protected. Public variables can be accessed by anyone. Private variables (denoted by double underscores __) cannot be accessed outside the class. Protected variables (denoted by a single underscore _) are a warning to developers not to change them.
End-To-End Python Data Example [1:07:38]
This section demonstrates creating a class for data extraction, allowing users to read any file format. The class includes methods for fetching CSV, JSON, and Parquet files. The fetch_text method uses a separator argument to handle different text-based file formats. This class promotes code reusability and scalability.
Class Methods vs Static Methods [1:20:06]
There are different types of methods: instance methods, class methods, and static methods. Instance methods are the default methods that interact with the instance. Class methods interact with class attributes and are defined using the @classmethod decorator. Static methods are independent of the class and do not require self as a parameter; they are defined using the @staticmethod decorator.
Inheritance in Python [1:44:28]
Inheritance allows you to reuse code from existing classes. When you inherit a class, you can extend it with your own methods and attributes. This avoids writing code from scratch every time. The Python community is very active, and you can inherit classes from public repositories.
Single Level, Multi-Level, Multiple Level [1:53:40]
There are different types of inheritance: single-level, multi-level, and multiple. Single-level inheritance involves one parent class. Multi-level inheritance involves inheriting from a class that already inherits from another class. Multiple inheritance involves inheriting from multiple classes together.
Polymorphism in Python [2:26:59]
Polymorphism means "many forms." It allows you to create objects with different behaviors based on their class. For example, you can have different classes for fetching data from APIs, databases, and S3 buckets, but all have a fetch method. This promotes scalability and code reusability.
Decorators [2:40:18]
Decorators are a way to modify functions without changing their code. They are used to add extra functionality to functions. A decorator is a function that takes another function as an argument and returns a new function. Decorators are used in modern frameworks like Apache Spark and Apache Airflow.
Multi-Threading [3:06:01]
Multithreading allows you to run multiple tasks concurrently within a single process. This is useful for IO-bound tasks, such as making API calls. Instead of waiting for each task to complete, you can use multithreading to utilize the thread fully. The ThreadPoolExecutor is a modern way to implement multithreading.
Pydantic For Schema Validation [3:28:16]
Pydantic is used for data validation and parsing. It allows you to define a schema for your data and validate that the data conforms to that schema. Pydantic promotes type hints and strict mode for data validation. It also supports serialization and deserialization.
Async Python [3:51:46]
Asynchronous Python allows you to write code that can perform multiple tasks concurrently without blocking the main thread. This is useful for IO-bound tasks, such as making API calls. Asynchronous Python uses coroutines and event loops to manage tasks.
Multiple Coroutines With Gather [4:24:59]
To run multiple coroutines concurrently, you can use asyncio.gather. This allows you to fetch data from multiple APIs asynchronously. The event loop manages the threads and ensures that they are not sitting idle.
APIs Overview [4:43:09]
APIs (Application Programming Interfaces) act as intermediaries between clients and servers. They allow you to access data from databases and other sources. APIs have endpoints, which are URLs that you can use to make requests. APIs use different methods, such as GET, POST, PUT, and DELETE.
Fetch Data From APIs [4:55:02]
To fetch data from APIs, you can use the requests library in Python. You can use the get method to make a GET request and the post method to make a POST request. You can also pass query parameters and headers with your requests.
PyTest [5:30:04]
PyTest is a testing framework for Python. It allows you to write test cases for your functions, classes, and utilities. PyTest uses assert statements to check if the code is working correctly. You can also use parameterized testing to write more efficient test cases.
OS Module [6:13:11]
The OS module allows you to interact with the operating system. You can use it to create folders, create files, read the content of a folder, and perform incremental data loading. The OS module is essential for data engineers who need to automate data pipelines.