Azure Data Factory Components
- Azure Data Factory Overview
- Integration Run Time
- Linked Service
- Pipeline, Dataset and Activity
- Source and Sink Configuration
- Copy Data Activity
- Recursive, Wildcard, File Listing, Mapping, User Properties.
- Parameterization of Copy Data Activity
- Sequential and Parallel Copy
Data Loading in Azure SQL Server
- Upsert, Pre-Copy Script, and AutoCreate Table Options.
- Handling Multiple Files with Lookup
- Full Load Pipeline
- New Watermark and Old watermark Concept
- Delta Laod for Single File.
- Delta Load for Multiple File
Pipeline Monitoring
- Monitoring of Activity
- Monitoring of Pipeline
- Execute Pipeline Activity
Azure DevOps and Git
- Azure DevOps Overview
- Azure DevOps Integration with ADF
- Azure Data Factory Integration with Git
- Git Pull, Git Push
- Cherry Pick, Git Revert.
Azure Key Vault and Security
- Azure Key Vault
- Managing Keys and Security
- Azure Key Vault integration with ADF
- Azure Key Vault Integration with Linked Service
Data Modelling and Design
- Fact and Dimension Tables.
- Star and Snowflake Schemas
- SCD Types and Implementation
- Creating Stored Procedures for Data Modelling
- Trigger Stored Procedure from ADF
Azure Logic Apps and Notifications
- Azure Logic App
- Sending Email Alerts using Logic App
Project 1 – Migrating Data from MS SQL Server to Cloud
- Full Load and Delta Load with Monitoring and Warehousing
Azure Data Factory Advance
- Event-Based, Scheduled, and Tumbling Window Triggers
- Data flow and Data Transformation Activity
- If-Else, Metadata, and Web Activities
- Global Parameter
- Parameterizing Triggers
Azure Databricks
- Overview of Databricks
- Spark and Spark Architecture
- Data Lake and Delta Lake
- Delta Table and Features
- Processing CSV, JSON AND XML file using PySpark.
- Integration of Azure Key Vault with Databricks.
- Secret Scope and JDBC connector
- Writing Data to Azure SQL server.
Project 2 – KPI Dashboarding with Azure Databricks
- Data processing using Pyspark and creating Delta Lake
Azure Synapse Analytics
- Overview of Azure Synapse
- Difference between Synapse and Data Factory
- Synapse Dedicated Pool
- Synapse Serverless Pool
- Polybase Copy
- Synapse Spark Pool
Project 3 – Dashboarding using Azure Synapse
- Orchestrated the secure migration of data from on-premise servers to Azure Data Lake using Synapse Analytics pipelines
Microsoft Fabric
- Microsoft Fabric Introduction and Signup process.
- OneLake and Lakehouse in Fabric.
- Synapse Fabric Data Warehouse and Engineering
- Spark in Fabric and Data Processing.
- Real time Analytics in Fabric.
- Microsoft Fabric with Power BI.
Project 4 – A Project on Microsoft Fabric
Big Data Processing Using Data Bricks and Spark
Introduction to Data Bricks and Spark Architecture
- Overview of Databricks and Spark Architecture
- Databricks Workspace Overview
- Understanding Databricks Services and Features
- Cluster Management: Creation, Autoscaling, and Administration
RDD and Data Frame Fundamentals
- RDD (Resilient Distributed Dataset) Overview
- Data Frame Spark API and Data Source API Fundamentals
- Conversion between PySpark and Pandas
- Common Transformation Techniques in PySpark
- Transformations vs. Actions
Dbutils and Parameterization
- Usage of Dbutils for File System Interaction
- Parameterization Techniques in Databricks
Delta Lake and Delta Table
- Table Manipulation with Delta Lake
- Read and Process CSV, JSON, and XML Files
- Types of Views in Databricks (Global, Local, Temporary)
- Managed vs. Unmanaged Tables
- Versioning and Time Travel in Delta Lake
Azure DevOps and Git Integration
- Azure DevOps and Git Workflow Integration
- Cherry Pick and Git Revert Commands
- Secret Scope Creation and Management
- JDBC Connector for SQL Server
Project 5 – Retail Dashboarding with Azure Databricks
- Data processing using Pyspark and creating Delta Lake
Databricks CLI and Backup Process
- Databricks CLI Overview and Installation
- Backup Process Setup for Notebooks and Configurations.
Understanding Spark UI
- Navigating the Spark UI for Job Monitoring
- Understanding the Stages, Tasks, and Execution Plans
Unity Catalog and SCD (Slowly Changing Dimensions) Implementation.
- Overview of Unity Catalog for Data Governance
- SCD Types (Type 1, Type 2, Type 3) and Implementation
Lakehouse and Medallion Architecture
- Introduction to Lakehouse Architecture
- Medallion Architecture (Bronze, Silver, Gold Layers) Overview
Spark Optimization Techniques
- User-Defined Functions (UDF) for Custom Transformations
- Catalyst Optimizer and Data Frame. explain for Optimization
- Directed Acyclic Graphs (DAG) and Adaptive Query Execution (AQE)
- Predicate Pushdown and Projection Pushdown
- Repartition, Coalesce, Cache, and Persist
Project 6 – Medallion Architecture with Unity Catalog
- Process data using PySpark and create Delta Lake while applying Medallion Architecture for structured data layers.
Handling Complex Data and Advanced Joins
- Handling Complex JSON, Struct, and Nested Data Types
- Data Skew and Techniques for Handling Skewed Data
- Sort Merge Join, Broadcast Join, and Optimizing Joins
- Z-Ordering for Efficient Querying
Orchestration and Scheduling Techniques
- Job Orchestration and Scheduling with Databricks Jobs API
- Best Practices for Workflow Orchestration
Resume Preparation and Interview tips
- Resume Preparation and Interview tips