Raj Cloud Technologies

AWS Data Engineering Live Online Training

Ratings 4.8
4.6/5

(Rating based on 500+ reviews)

NEW BATCH STARTING SOON!

Live: Instructor Led Training

This hands-on training is designed to equip aspiring Data Engineers with practical expertise in building scalable and efficient data pipelines using AWS services. Covering key AWS components such as S3, EC2, Lambda, IAM, Glue, Redshift, and more, this job-oriented program focuses on real-time, end-to-end project implementation.

You will gain proficiency in designing cloud-native data architectures, orchestrating workflows with Step Functions, processing big data with EMR, and implementing real-time streaming with Kinesis. Upon completion, you’ll be interview-ready with strong experience in building AWS-based Data Engineering solutions from scratch.

+91 81052 96858

Have Queries? Ask our Experts.

Get More Info, Enquire Now!

We are available 24x7 for your queries.

Contact Form (AWS-DataEng)

Our students were hired by:

AWS Data Engineering Online Live Training

Technologies Taught

Course Unique Features

After completion of this course you can apply for roles:

Course Curriculum

AWS Data Engineering Training

Python

1. Python Basics

  • What is Python?
  • Why Python for Data Engineering?
  • Installing Python and Setting Up Environment (IDEs, Jupyter, VSCode)
  • Running Python Scripts and Notebooks
  • Basic Syntax and Indentation Variables and Data Types (int, float, str, bool, None Type)
  • Type Casting and type () function

2. Operators and Expressions

  • Arithmetic, Comparison, Logical Operators
  • Membership (in, not in) and Identity Operators
  • Operator Precedence and Associativity

3. Control Flow

  • if, elif, else Statements
  • while and for Loops
  • Loop Control: break, continue, pass
  • List Comprehensions (important for Glue transformations)

4. Functions

  • Defining and Calling Functions
  • Parameters and Return Values
  • Lambda Functions (used heavily in PySpark)
  • map(), filter(), reduce() (from functools)

5. Data Structures

  • Lists, Tuples, Sets, Dictionaries
  • CRUD operations on each data structure
  • Iterating through collections
  • Common built-in functions (len, sum, sorted, zip, etc.)

6. String and Date Handling

  • String Manipulation and Formatting
  • split(), join(), slicing, and regex intro (re module)
  • Introduction to datetime and time modules (for partition/date-based transformations)

7. Exception Handling

  • Try-Except Blocks
  • Catching Specific Exceptions
  • finally and else in error handling
  • Importance in ETL pipeline robustness

8. Intro to OOP (Optional but Useful)

  • Classes and Objects
  • Constructors (__init__)
  • self keyword
  • Simple inheritance and method overriding

Data Warehouse

1. Introduction to Data Warehousing

  • What is Data Warehousing?
  • OLTP vs OLAP
  • Data Warehouse Architecture (Single-tier, Two-tier, Three-tier)
  • Components of a Data Warehouse
  • ETL vs ELT in Data Warehousing

2. Data Modeling Fundamentals

  • What is Data Modeling?
  • Conceptual, Logical, and Physical Data Models
  • Key Data Modeling Concepts: Entities, Attributes, Relationships
  • Primary Keys, Foreign Keys, and Constraints
  • Normalization & Denormalization
  • Choosing the Right Model for Analytical Workloads

3. Dimensional Modeling & Star Schema

  • Introduction to Dimensional Modeling
  • Fact Tables vs Dimension Tables
  • Star Schema: Concepts & Design
  • Snowflake Schema: When to Use It?
  • Slowly Changing Dimensions (SCD) (Types 0, 1, 2, 3, 4, 6)
  • Handling Hierarchies & Aggregations\

4. ETL & Data Integration in Data Warehousing

  • Overview of ETL & ELT Processes
  • Common ETL Challenges & Solutions
  • Data Quality & Data Governance in ETL
  • Change Data Capture (CDC) Strategies

5. Modern Data Warehousing

  • Traditional Data Warehouses vs Cloud Data Warehouses
  • Introduction to Data Lakes & Data Lakehouses
  • Overview of Modern DW Platforms: Snowflake, BigQuery, Redshift, Synapse

Pyspark

1. Introduction to PySpark

  • What is PySpark?
  • PySpark vs Pandas vs Dask
  • PySpark Architecture & Execution Model
  • Setting up PySpark in Google Colab
  • Introduction to SparkSession & DataFrames

2. Data Loading & Basic Transformations in PySpark

  • Reading & Writing Data (CSV, JSON, Parquet, Avro)
  • Understanding Schema Inference & Defining Schemas
  • Basic Transformations: select(), filter(), withColumn(), drop()
  • Handling Nulls & Missing Data (fillna(), dropna(), replace())
  • Column Operations: cast(), alias(), when(), case()
  • Working with Date & Time Functions (current_date(), datediff(), date_add())

3. Advanced PySpark Transformations

  • Grouping & Aggregations (groupBy(), agg(), pivot())
  • Joins in PySpark (inner, left, right, full)
  • Window Functions (Row Number, Ranking, Lead/Lag, Running Totals)
  • Exploding & Flattening Nested Data (explode(), array(), struct())
  • Working with UDFs (User-Defined Functions)
  • Broadcasting & Skew Handling

4. Performance Optimization & Debugging in PySpark

  • Understanding Spark Execution Plan (explain(), cache(), persist())
  • Catalyst Optimizer & Tungsten Execution
  • Partitioning & Bucketing Strategies
  • Repartitioning & Coalescing
  • Optimizing Shuffle Operations
  • Performance Tuning Parameters (spark.conf.set())

PySpark Assignment Problem

  • Statements 1 – Hands-On Coding PySpark Assignment Problem
  • Statements 2 – Hands-On Coding

Capstone Project 1 – Complex PySpark Transformation – Hands-On Coding

Amazon Web Services ( AWS )

1. AWS Setup & Fundamentals

  • Setting up AWS Account and Configuring IAM Roles & Policies
  • Creating S3 Buckets, Uploading Data, and Configuring Permissions
  • Implementing IAM Best Practices for Secure Data Access

2. AWS Glue – Data Catalog & Crawler

  • Setting Up AWS Glue Crawler to Discover Metadata
  • Creating and Querying AWS Glue Catalog Tables
  • Schema Evolution & Handling Semi-Structured Data (JSON, Parquet)
  • Integrating Glue Catalog with Athena & Redshift Spectrum

3. AWS Athena – Querying Data Lake

  • Writing SQL Queries on S3 Data Using Athena
  • Optimizing Queries with Partitioning & Bucketing
  • Using Iceberg Tables in Athena for Time-Travel Queries
  • Performance Optimization: Query Federation & Compression Techniques

4. AWS Glue PySpark – Data Transformation

  • Setting Up AWS Glue Job with PySpark
  • Transforming & Cleaning Raw Data Using PySpark in Glue
  • Handling Schema Drift in Glue ETL Pipelines
  • Writing Processed Data to S3, Redshift, and RDS

5. Real-Time Data Ingestion Using AWS Glue & REST API

  • Configuring AWS Glue Job to Ingest Data from REST API
  • Using AWS Lambda to Trigger Glue Jobs on Event Streams
  • Handling Real-Time Data Streams in PySpark
  • Writing Ingested Data to Iceberg Tables in Athena

6. AWS Redshift – Data Warehousing

  • Setting Up an Amazon Redshift Cluster
  • Loading Data from S3 to Redshift Using COPY Command
  • Performance Tuning with Sort & Distribution Keys
  • Running Complex Analytical Queries in Redshift

7. AWS CloudFormation – Infrastructure as Code

  • Creating S3, IAM Roles, Glue Jobs, and Redshift Using CloudFormation
  • Automating Data Pipeline Deployment Using CloudFormation Templates
  • Managing Stack Updates & Rollbacks

Athena Assignment & Problem Statements

  • Statements 1 – Hands-On Coding Redshift Assignment Problem
  • Statements 2 – Hands-On Coding Glue PySpark Assignment Problem
  • Statements 3 – Hands-On Coding

Final Capstone Project 2 End-to-End Data Engineering Pipeline

Upon completing this training

What you’ll learn Upon completing this training

Course Instructed By:

Ms.Rahul R

A seasoned professional with over 14 years of experience in cloud and data engineering.
He has worked in key roles such as Principal Data Architect and CTO in leading tech organizations.
Expertise includes AWS, Azure, GCP, PySpark, Redshift, ADF, Snowflake, and Databricks.
Known for building enterprise-scale data platforms and mentoring tech professionals globally. Approved trainer by Raj Cloud Technologies.

Like the curriculum? Enroll Now

Login or Sign up using your Google account for fast enrolment and easy access.

AWS Data Engineering Online Live Training

Total Fee: ₹25,000/-

Timing: 7:00PM, IST

What our students say?

Meghana R
Meghana R
@meghana-r
Read More
I just want to share my experience about Natraj sir training, it is one of the best training I had ever on informatica. I learned lots of real time concepts from Raj sir training and also they are very useful in my job. The training is based on Realtime scenarios so that you will get familiar with the concepts of informatica and Oracle and Unix. Thank you Raj sir for giving us such a nice training and so much of confidence...
Akash Dhus
Akash Dhus
Read More
It's a fantastic course for a beginner also. I could feel the effort that was put into to make sure people understood. Thank you Raj, when I become one the greatest, I will remember this beginning. A wonderful experience . The lecturers are great with a very nice way on interacting and lots of useful material. Thank you for all your cooperation. Hope to see more of you in future. Thank you once again.
Previous
Next

Download Course Curriculum (Syllabus)

Contact Form (AWS-DataEng)
Log in/Sign up with Google account for password less login.

Or

Login with your email & password

AWS Data Engineering Live Training

New Batch,

Starting Soon!

Interested To Join? fill this form below

Contact Form (AWS-DataEng)

Registred Email:

- Not Updated -

Login to update/set a password or try "Forget Password"