Curso formativo para superar la certificación oficial de Microsoft de Data Engineer
Design and implement data storage (40-45%) Design and develop data processing (25-30%) Design and implement data security (10-15%) Monitor and optimize data storage and data processing (10-15%)
30
Horas
Horas dedicadas al curso.
273
Clases
Clases totales
45
Estudio
Horas dedicadas de estudio
8
Nivel
Dificil
Design and implement data storage
- Understanding data
- Lab - Azure Storage accounts.
- Lab - Azure SQL databases.
- Lab - Application connecting to Azure Storage and SQL database.
- Lab - Application connecting to Azure Storage and SQL database - Resources.
- Different file formats
- Azure Data Lake Gen-2 storage accounts
- Lab - Creating an Azure Data Lake Gen-2 storage account
- Using PowerBI to view your data
- Lab - Authorizing to Azure Data Lake Gen 2 - Access Keys - Storage Explorer
- Lab - Authorizing to Azure Data Lake Gen 2 - Shared Access Signatures
- Azure Storage Account - Redundancy
- Azure Storage Account - Access tiers
- Azure Storage Account - Lifecycle policy
- Note on Costing
Design and develop data storage Transact-SQL
- The internals of a database engine
- Lab - Setting up a new Azure SQL database
- Code for this section
- Lab - T-SQL - SELECT clause
- Lab - T-SQL - WHERE clause
- Lab - T-SQL - ORDER BY clause
- Lab - T-SQL - Aggregate Functions
- Lab - T-SQL - GROUP BY clause
- Lab - T-SQL - HAVING clause
- Quick Review on Primary and Foreign Keys
- Lab - T-SQL - Creating Tables with Keys
- Lab - T-SQL - Table Joins
Design and implement data storage - Azure Synapse Analytics
- Describe the basics of the Databricks SQL service.
- Why do we need a data warehouse
- Welcome to Azure Synapse Analytics
- Lab - Creating a SQL pool
- Lab - SQL Pool - External Tables - CSV
- Data Cleansing
- Lab - SQL Pool - External Tables - CSV with formatted data
- Lab - SQL Pool - External Tables - Parquet - Part 1
- Lab - SQL Pool - External Tables - Parquet - Part 2
- Loading data into the Dedicated SQL Pool
- Lab - Loading data into a table - COPY Command - CSV
- Lab - Loading data into a table - COPY Command - Parquet
- Pausing the Dedicated SQL pool
- Lab - Loading data using PolyBase
- Lab - BULK INSERT from Azure Synapse
- My own experience
- Designing a data warehouse
- More on dimension tables
- Lab - Building a data warehouse - Setting up the database
- Lab - Building a Fact Table
- Lab - Building a dimension table
- Lab - Transfer data to our SQL Pool
- Other points in the copy activity
- Lab - Using Power BI for Star Schema
- Understanding Azure Synapse Architecture
- Understanding table types
- Understanding Round-Robin tables
- Lab - Creating Hash-distributed Tables
- Note on creating replicated tables
- Designing your tables
- Designing tables - Review
- Lab - Example when using the right distributions for your tables
- Points on tables in Azure Synapse
- Lab - Windowing Functions
- Lab - Reading JSON files
- Lab - Surrogate keys for dimension tables
- Slowly Changing dimensions
- Type 3 Slowly Dimension dimension
- Creating a heap table
- Snowflake schema
- Lab - CASE statement
- Partitions in Azure Synapse
- Lab - Creating a table with partitions
- Lab - Switching partitions
- Indexes
- Quick Note - Modern Data Warehouse Architecture
- Quick Note on what we are taking forward to the next sections
- What about the Spark Pool
Design and Develop Data Processing - ADF (Data Factory)
- Extract, Transform and Load
- What is Azure Data Factory
- Starting with Azure Data Factory
- Lab - Azure Data Lake to Azure Synapse - Log.csv file
- Lab - Azure Data Lake to Azure Synapse - Parquet files
- Review on what has been done so far
- Lab - Generating a Parquet file
- Lab - What about using a query for data transfer
- Deleting artefacts in Azure Data Factory
- Mapping Data Flow
- Lab - Mapping Data Flow - Fact Table
- Lab - Mapping Data Flow - Dimension Table - DimCustomer
- Lab - Mapping Data Flow - Dimension Table - DimProduct
- Lab - Surrogate Keys - Dimension tables
- Lab - Using Cache sink
- Lab - Handling Duplicate rows
- Changing connection details
- Lab - Changing the Time column data in our Log.csv file
- Lab - Convert Parquet to JSON
- Lab - Loading JSON into SQL Pool
- Self-Hosted Integration Runtime
- Lab - Self-Hosted Runtime - Setting up nginx
- Lab - Self-Hosted Runtime - Setting up the runtime
- Lab - Self-Hosted Runtime - Copy Activity
- Lab - Self-Hosted Runtime - Mapping Data Flow
- Lab - Processing JSON Arrays
- Lab - Processing JSON Objects
- Lab - Conditional Split
- Lab - Schema Drift
- Lab - Metadata activity
- Lab - Azure DevOps - Git configuration
- Lab - Azure DevOps - Release configuration
- What resources are we taking forward
Design and Develop Data Processing - Azure Event Hubs and Stream Analytics
- Code for this section
- Batch and Real-Time Processing
- What are Azure Event Hubs
- Lab - Creating an instance of Event hub
- Lab - Sending and Receiving Events
- What is Azure Stream Analytics
- Lab - Creating a Stream Analytics job
- Lab - Azure Stream Analytics - Defining the job
- Review on what we have seen so far
- Lab - Reading database diagnostic data - Setup
- Lab - Reading data from a JSON file - Setup
- Lab - Reading data from a JSON file - Implementation
- Lab - Reading data from the Event Hub - Setup
- Lab - Reading data from the Event Hub - Implementation
- Lab - Timing windows
- Lab - Adding multiple outputs
- Lab - Reference data
- Lab - OVER clause
- Lab - Power BI Output
- Lab - Reading Network Security Group Logs - Server Setup
- Lab - Reading Network Security Group Logs - Enabling NSG Flow Logs
- Lab - Reading Network Security Group Logs - Processing the data
- Lab - User Defined Functions
- Custom Serialization Formats
- Lab - Azure Event Hubs - Capture Feature
- Lab - Azure Data Factory - Incremental Data Copy
- Demo on Azure IoT Devkit
- What resources are we taking forward
Design and Develop Data Processing - Scala,Notebooks and Spark
- Introduction to Scala
- Installing Scala
- Scala - Playing with values
- Scala - Installing IntelliJ IDE
- Scala - If construct
- Scala - for construct
- Scala - while construct
- Scala - case construct
- Scala - Functions
- Scala - List collection
- Starting with Python
- Python - A simple program
- Python - If construct
- Python - while construct
- Python - List collection
- Python - Functions
- Quick look at Jupyter Notebook
- Lab - Azure Synapse - Creating a Spark pool
- Lab - Spark Pool - Starting out with Notebooks
- Lab - Spark Pool - Spark DataFrames
- Lab - Spark Pool - Sorting data
- Lab - Spark Pool - Load data
- Lab - Spark Pool - Removing NULL values
- Lab - Spark Pool - Using SQL statements
- Lab - Spark Pool - Write data to Azure Synapse
- Spark Pool - Combined Power
- Lab - Spark Pool - Sharing tables
- Lab - Spark Pool - Creating tables
- Lab - Spark Pool - JSON files
Design and Develop Data Processing - Azure Databricks
- Introduction to Scala
- What is Azure Databricks
- Clusters in Azure Databricks
- Lab - Creating a workspace
- Lab - Creating a cluster
- Lab - Simple notebook
- Lab - Using DataFrames
- Lab - Reading a CSV file
- Databricks File System
- Lab - The SQL Data Frame
- Visualizations
- Lab - Few functions on dates
- Lab - Filtering on NULL values
- Lab - Parquet-based files
- Lab - JSON-based files
- Lab - Structured Streaming - Let's first understand our data
- Lab - Structured Streaming - Streaming from Azure Event Hubs - Initial steps
- Lab - Structured Streaming - Streaming from Azure Event Hubs - Implementation
- Lab - Getting data from Azure Data Lake - Setup
- Lab - Getting data from Azure Data Lake - Implementation
- Lab - Writing data to Azure Synapse SQL Dedicated Pool
- Lab - Stream and write to Azure Synapse SQL Dedicated Pool
- Lab - Azure Data Lake Storage Credential Passthrough
- Lab - Running an automated job
- Autoscaling a cluster
- Lab - Removing duplicate rows
- Lab - Using the PIVOT command
- Lab - Azure Databricks Table
- Lab - Azure Data Factory - Running a notebook
- Delta Lake Introduction
- Lab - Creating a Delta Table
- Lab - Streaming data into the table
- Lab - Time Travel
- Quick note on the deciding between Azure Synapse and Azure Databricks
- What resources are we taking forward
Design and Implentent Data Security
- What is the Azure Key Vault service
- Azure Data Factory - Encryption
- Azure Synapse - Customer Managed Keys
- Azure Dedicated SQL Pool - Transparent Data Encryption
- Lab - Azure Synapse - Data Masking
- Lab - Azure Synapse - Auditing
- Azure Synapse - Data Discovery and Classification
- Azure Synapse - Azure AD Authentication
- Lab - Azure Synapse - Azure AD Authentication - Setting the admin
- Lab - Azure Synapse - Azure AD Authentication - Creating a user
- Lab - Azure Synapse - Row-Level Security
- Lab - Azure Synapse - Column-Level Security
- Lab - Azure Data Lake - Role Based Access Control
- Lab - Azure Data Lake - Access Control Lists
- Lab - Azure Synapse - External Tables Authorization via Managed Identity
- Lab - Azure Synapse - External Tables Authorization via Azure AD Authentication
- Lab - Azure Synapse - Firewall
- Lab - Azure Data Lake - Virtual Network Service Endpoint
- Lab - Azure Data Lake - Managed Identity - Data Factory
Monitor and optimize data storage and data processing
- Azure Storage accounts - Query acceleration
- View on Azure Monitor
- Azure Monitor - Alerts
- Azure Synapse - System Views
- Azure Synapse - Result set caching
- Azure Synapse - Workload Management
- Azure Synapse - Retention points
- Lab - Azure Data Factory - Monitoring
- Azure Data Factory - Monitoring - Alerts and Metrics
- Lab - Azure Data Factory - Annotations
- Azure Data Factory - Integration Runtime - Note
- Azure Data Factory - Pipeline Failures
- Azure Key Vault - High Availability
- Azure Stream Analytics - Metrics
- Azure Stream Analytics - Streaming Units
- Azure Stream Analytics - An example on monitoring the stream analytics job
- Azure Stream Analytics - The importance of time
- Azure Stream Analytics - More on the time aspect
- Azure Event Hubs and Stream Analytics - Partitions
- Azure Stream Analytics - An example on multiple partitions
- Azure Stream Analytics - More on partitions
- Azure Stream Analytics - An example on diagnosing errors
- Azure Stream Analytics - Diagnostics setting
- Azure Databricks - Monitoring
- Azure Databricks - Sending logs to Azure Monitor
- Azure Event Hubs - High Availability
Satisfacción
¿Qué aprendí?
- Almacenamiento de Azure Data Lake Gen 2.
- Comandos de Transact-SQL.
- Cómo trabajar con Azure Synapse.
- Cómo crear una canalización de ETL con la ayuda de Azure Data Factory.
- Cómo transmitir datos con el uso de Azure Stream Analytics..
- Lenguaje de programación Scala y SPARK.
- Cómo trabajar con SPARK, Scala en Azure Databricks.
- Cómo trabajar con Notebooks
- Transmitir datos a Azure Databricks
- Diferentes medidas de seguridad y aspectos de monitorización a tener en cuenta al trabajar con servicios de Azure