DP-203 Data Engineer

Curso formativo para superar la certificación oficial de Microsoft de Data Engineer

Design and implement data storage (40-45%) Design and develop data processing (25-30%) Design and implement data security (10-15%) Monitor and optimize data storage and data processing (10-15%)

30

Horas


Horas dedicadas al curso.
273

Clases


Clases totales
45

Estudio


Horas dedicadas de estudio
8

Nivel


Dificil

Design and implement data storage

  • Understanding data
  • Lab - Azure Storage accounts.
  • Lab - Azure SQL databases.
  • Lab - Application connecting to Azure Storage and SQL database.
  • Lab - Application connecting to Azure Storage and SQL database - Resources.
  • Different file formats
  • Azure Data Lake Gen-2 storage accounts
  • Lab - Creating an Azure Data Lake Gen-2 storage account
  • Using PowerBI to view your data
  • Lab - Authorizing to Azure Data Lake Gen 2 - Access Keys - Storage Explorer
  • Lab - Authorizing to Azure Data Lake Gen 2 - Shared Access Signatures
  • Azure Storage Account - Redundancy
  • Azure Storage Account - Access tiers
  • Azure Storage Account - Lifecycle policy
  • Note on Costing

Design and develop data storage Transact-SQL

  • The internals of a database engine
  • Lab - Setting up a new Azure SQL database
  • Code for this section
  • Lab - T-SQL - SELECT clause
  • Lab - T-SQL - WHERE clause
  • Lab - T-SQL - ORDER BY clause
  • Lab - T-SQL - Aggregate Functions
  • Lab - T-SQL - GROUP BY clause
  • Lab - T-SQL - HAVING clause
  • Quick Review on Primary and Foreign Keys
  • Lab - T-SQL - Creating Tables with Keys
  • Lab - T-SQL - Table Joins

Design and implement data storage - Azure Synapse Analytics

  • Describe the basics of the Databricks SQL service.
  • Why do we need a data warehouse
  • Welcome to Azure Synapse Analytics
  • Lab - Creating a SQL pool
  • Lab - SQL Pool - External Tables - CSV
  • Data Cleansing
  • Lab - SQL Pool - External Tables - CSV with formatted data
  • Lab - SQL Pool - External Tables - Parquet - Part 1
  • Lab - SQL Pool - External Tables - Parquet - Part 2
  • Loading data into the Dedicated SQL Pool
  • Lab - Loading data into a table - COPY
  • Command - CSV
  • Lab - Loading data into a table - COPY Command - Parquet
  • Pausing the Dedicated SQL pool
  • Lab - Loading data using PolyBase
  • Lab - BULK INSERT from Azure Synapse
  • My own experience
  • Designing a data warehouse
  • More on dimension tables
  • Lab - Building a data warehouse - Setting up the database
  • Lab - Building a Fact Table
  • Lab - Building a dimension table
  • Lab - Transfer data to our SQL Pool
  • Other points in the copy activity
  • Lab - Using Power BI for Star Schema
  • Understanding Azure Synapse Architecture
  • Understanding table types
  • Understanding Round-Robin tables
  • Lab - Creating Hash-distributed Tables
  • Note on creating replicated tables
  • Designing your tables
  • Designing tables - Review
  • Lab - Example when using the right distributions for your tables
  • Points on tables in Azure Synapse
  • Lab - Windowing Functions
  • Lab - Reading JSON files
  • Lab - Surrogate keys for dimension tables
  • Slowly Changing dimensions
  • Type 3 Slowly Dimension dimension
  • Creating a heap table
  • Snowflake schema
  • Lab - CASE statement
  • Partitions in Azure Synapse
  • Lab - Creating a table with partitions
  • Lab - Switching partitions
  • Indexes
  • Quick Note - Modern Data Warehouse Architecture
  • Quick Note on what we are taking forward to the next sections
  • What about the Spark Pool

Design and Develop Data Processing - ADF (Data Factory)

  • Extract, Transform and Load
  • What is Azure Data Factory
  • Starting with Azure Data Factory
  • Lab - Azure Data Lake to Azure Synapse - Log.csv file
  • Lab - Azure Data Lake to Azure Synapse - Parquet files
  • Review on what has been done so far
  • Lab - Generating a Parquet file
  • Lab - What about using a query for data transfer
  • Deleting artefacts in Azure Data Factory
  • Mapping Data Flow
  • Lab - Mapping Data Flow - Fact Table
  • Lab - Mapping Data Flow - Dimension Table - DimCustomer
  • Lab - Mapping Data Flow - Dimension Table - DimProduct
  • Lab - Surrogate Keys - Dimension tables
  • Lab - Using Cache sink
  • Lab - Handling Duplicate rows
  • Changing connection details
  • Lab - Changing the Time column data in our Log.csv file
  • Lab - Convert Parquet to JSON
  • Lab - Loading JSON into SQL Pool
  • Self-Hosted Integration Runtime
  • Lab - Self-Hosted Runtime - Setting up nginx
  • Lab - Self-Hosted Runtime - Setting up the runtime
  • Lab - Self-Hosted Runtime - Copy Activity
  • Lab - Self-Hosted Runtime - Mapping Data Flow
  • Lab - Processing JSON Arrays
  • Lab - Processing JSON Objects
  • Lab - Conditional Split
  • Lab - Schema Drift
  • Lab - Metadata activity
  • Lab - Azure DevOps - Git configuration
  • Lab - Azure DevOps - Release configuration
  • What resources are we taking forward

Design and Develop Data Processing - Azure Event Hubs and Stream Analytics

  • Code for this section
  • Batch and Real-Time Processing
  • What are Azure Event Hubs
  • Lab - Creating an instance of Event hub
  • Lab - Sending and Receiving Events
  • What is Azure Stream Analytics
  • Lab - Creating a Stream Analytics job
  • Lab - Azure Stream Analytics - Defining the job
  • Review on what we have seen so far
  • Lab - Reading database diagnostic data - Setup
  • Lab - Reading data from a JSON file - Setup
  • Lab - Reading data from a JSON file - Implementation
  • Lab - Reading data from the Event Hub - Setup
  • Lab - Reading data from the Event Hub - Implementation
  • Lab - Timing windows
  • Lab - Adding multiple outputs
  • Lab - Reference data
  • Lab - OVER clause
  • Lab - Power BI Output
  • Lab - Reading Network Security Group Logs - Server Setup
  • Lab - Reading Network Security Group Logs - Enabling NSG Flow Logs
  • Lab - Reading Network Security Group Logs - Processing the data
  • Lab - User Defined Functions
  • Custom Serialization Formats
  • Lab - Azure Event Hubs - Capture Feature
  • Lab - Azure Data Factory - Incremental Data Copy
  • Demo on Azure IoT Devkit
  • What resources are we taking forward

Design and Develop Data Processing - Scala,Notebooks and Spark

  • Introduction to Scala
  • Installing Scala
  • Scala - Playing with values
  • Scala - Installing IntelliJ IDE
  • Scala - If construct
  • Scala - for construct
  • Scala - while construct
  • Scala - case construct
  • Scala - Functions
  • Scala - List collection
  • Starting with Python
  • Python - A simple program
  • Python - If construct
  • Python - while construct
  • Python - List collection
  • Python - Functions
  • Quick look at Jupyter Notebook
  • Lab - Azure Synapse - Creating a Spark pool
  • Lab - Spark Pool - Starting out with Notebooks
  • Lab - Spark Pool - Spark DataFrames
  • Lab - Spark Pool - Sorting data
  • Lab - Spark Pool - Load data
  • Lab - Spark Pool - Removing NULL values
  • Lab - Spark Pool - Using SQL statements
  • Lab - Spark Pool - Write data to Azure Synapse
  • Spark Pool - Combined Power
  • Lab - Spark Pool - Sharing tables
  • Lab - Spark Pool - Creating tables
  • Lab - Spark Pool - JSON files

Design and Develop Data Processing - Azure Databricks

  • Introduction to Scala
  • What is Azure Databricks
  • Clusters in Azure Databricks
  • Lab - Creating a workspace
  • Lab - Creating a cluster
  • Lab - Simple notebook
  • Lab - Using DataFrames
  • Lab - Reading a CSV file
  • Databricks File System
  • Lab - The SQL Data Frame
  • Visualizations
  • Lab - Few functions on dates
  • Lab - Filtering on NULL values
  • Lab - Parquet-based files
  • Lab - JSON-based files
  • Lab - Structured Streaming - Let's first understand our data
  • Lab - Structured Streaming - Streaming from Azure Event Hubs - Initial steps
  • Lab - Structured Streaming - Streaming from Azure Event Hubs - Implementation
  • Lab - Getting data from Azure Data Lake - Setup
  • Lab - Getting data from Azure Data Lake - Implementation
  • Lab - Writing data to Azure Synapse SQL Dedicated Pool
  • Lab - Stream and write to Azure Synapse SQL Dedicated Pool
  • Lab - Azure Data Lake Storage Credential Passthrough
  • Lab - Running an automated job
  • Autoscaling a cluster
  • Lab - Removing duplicate rows
  • Lab - Using the PIVOT command
  • Lab - Azure Databricks Table
  • Lab - Azure Data Factory - Running a notebook
  • Delta Lake Introduction
  • Lab - Creating a Delta Table
  • Lab - Streaming data into the table
  • Lab - Time Travel
  • Quick note on the deciding between Azure Synapse and Azure Databricks
  • What resources are we taking forward

Design and Implentent Data Security

  • What is the Azure Key Vault service
  • Azure Data Factory - Encryption
  • Azure Synapse - Customer Managed Keys
  • Azure Dedicated SQL Pool - Transparent Data Encryption
  • Lab - Azure Synapse - Data Masking
  • Lab - Azure Synapse - Auditing
  • Azure Synapse - Data Discovery and Classification
  • Azure Synapse - Azure AD Authentication
  • Lab - Azure Synapse - Azure AD Authentication - Setting the admin
  • Lab - Azure Synapse - Azure AD Authentication - Creating a user
  • Lab - Azure Synapse - Row-Level Security
  • Lab - Azure Synapse - Column-Level Security
  • Lab - Azure Data Lake - Role Based Access Control
  • Lab - Azure Data Lake - Access Control Lists
  • Lab - Azure Synapse - External Tables Authorization via Managed Identity
  • Lab - Azure Synapse - External Tables Authorization via Azure AD Authentication
  • Lab - Azure Synapse - Firewall
  • Lab - Azure Data Lake - Virtual Network Service Endpoint
  • Lab - Azure Data Lake - Managed Identity - Data Factory

Monitor and optimize data storage and data processing

  • Azure Storage accounts - Query acceleration
  • View on Azure Monitor
  • Azure Monitor - Alerts
  • Azure Synapse - System Views
  • Azure Synapse - Result set caching
  • Azure Synapse - Workload Management
  • Azure Synapse - Retention points
  • Lab - Azure Data Factory - Monitoring
  • Azure Data Factory - Monitoring - Alerts and Metrics
  • Lab - Azure Data Factory - Annotations
  • Azure Data Factory - Integration Runtime - Note
  • Azure Data Factory - Pipeline Failures
  • Azure Key Vault - High Availability
  • Azure Stream Analytics - Metrics
  • Azure Stream Analytics - Streaming Units
  • Azure Stream Analytics - An example on monitoring the stream analytics job
  • Azure Stream Analytics - The importance of time
  • Azure Stream Analytics - More on the time aspect
  • Azure Event Hubs and Stream Analytics - Partitions
  • Azure Stream Analytics - An example on multiple partitions
  • Azure Stream Analytics - More on partitions
  • Azure Stream Analytics - An example on diagnosing errors
  • Azure Stream Analytics - Diagnostics setting
  • Azure Databricks - Monitoring
  • Azure Databricks - Sending logs to Azure Monitor
  • Azure Event Hubs - High Availability

Satisfacción

¿Qué aprendí?

  • Almacenamiento de Azure Data Lake Gen 2.
  • Comandos de Transact-SQL.
  • Cómo trabajar con Azure Synapse.
  • Cómo crear una canalización de ETL con la ayuda de Azure Data Factory.
  • Cómo transmitir datos con el uso de Azure Stream Analytics..
  • Lenguaje de programación Scala y SPARK.
  • Cómo trabajar con SPARK, Scala en Azure Databricks.
  • Cómo trabajar con Notebooks
  • Transmitir datos a Azure Databricks
  • Diferentes medidas de seguridad y aspectos de monitorización a tener en cuenta al trabajar con servicios de Azure