Course Highlights
- Submitted to NCVET for NSQF Alignment.
- The Data Lake Workshop (DLKW) is the fourth workshop in the Snowflake’s Hands-on Essentials Workshop series. The workshop gives learners extended practice querying data prior to load and the iterative development of File Formats using that querying process. Learners also learn about GeoSpatial data and use many of Snowflake's built-in geospatial functions.
- The workshop requires hands-on lab work to earn a badge. The lab work is auto-graded.
-
Skill Type
-
Course Duration
-
Domain
-
GOI Incentive applicable
-
Course Category
-
Nasscom Assessment
-
Placement Assistance
-
Certificate Earned
-
Content Alignment Type
-
NOS Details
-
Mode of Delivery
Course Details
What will you learn in Snowflake Hands-on Essentials Series Part 4 - Data Lake?
- Grasp how Snowflake can be used as a data lake, handling diverse data formats and volumes.
- Understand the benefits of using Snowflake for data lake workloads.
- Learn how to create and manage external tables that point to data in cloud storage (e.g., S3, Azure Blob Storage, Google Cloud Storage).
- Understand the advantages of querying data directly in cloud storage.
- Gain proficiency in querying and processing various data formats commonly found in data lakes, including:
- Parquet
- ORC
- Avro
- CSV
- JSON
- Understand how to use Snowflake to interact with semi-structured data.
- Learn how to organize and manage data within a Snowflake data lake.
- Understand how to optimize query performance for data lake workloads.
- Learn how to secure data within the datalake.
- Learn how to transform and prepare data within the data lake using Snowflake's SQL capabilities.
- Understand how to use Snowflake to create data pipelines for data lake processing.
- Understand how Snowflake's architecture supports high-performance querying and scalability for data lake workloads.
- Understand how virtual warehouses effect data lake workloads.
- Develop hands-on skills through practical exercises and labs.
- Become comfortable with performing common data lake tasks in Snowflake.
- Become familiar with automated lab grading.
Why should you take Snowflake Hands-on Essentials Series Part 4 - Data Lake course?
At the end of this course, the learner will be able use Snowflake to:
- Query not-yet-loaded data via Snowflake Worksheets.
- Work with geospatial data and geospatial functions.
- Create an external table.
- Create a materialized view.
- Create an Apache Iceberg table, query it, and run a DML statement on the data.
Who should take Snowflake Hands-on Essentials Series Part 4 - Data Lake course?
- Designed for people new to Snowflake or new to database work in general. The course can be used by managers who simply want to understand what Snowflake is generally capable of, or it can be used by those considering a career or as a data professional. Likewise, seasoned data professionals find the courses in this series a quick and easy introduction to tasks they are already familiar with, in a tool they are not.
Curriculum for the Snowflake Hands-on Essentials Series Part 4 - Data Lake Course
- Working with External Tables:
- Creating and managing external tables.
- Querying data directly in cloud storage (S3, Azure Blob Storage, Google Cloud Storage).
- Understanding the benefits and use cases of external tables.
- Handling Diverse Data Formats:
- Querying and processing various data lake formats:
- Parquet
- ORC
- Avro
- CSV
- JSON
- Working with semi-structured data.
- Data Lake Best Practices:
- Data organization and management within a Snowflake data lake.
- Query performance optimization.
- Data security considerations.
- Data Lake Transformations:
- Using Snowflake's SQL capabilities for data transformation.
- Building data pipelines for data lake processing.
- Snowflake's Performance and Scalability:
- Understanding how Snowflake's architecture supports data lake workloads.
- Virtual warehouse management for data lake tasks.
- Practical Application:
- Hands-on labs and exercises.
- Automated lab grading with DORA.
- Scenario-based learning.
- Working with GeoSpatial data.
- Working with Unstructured data.
Tools you will learn in Snowflake Hands-on Essentials Series Part 4 - Data Lake course
- Classify data according to whether it is structured, semi-structured, or unstructured.
- Create an internal Snowflake-Managed stage that uses CSE.
- Create an internal Snowflake-Managed stage that uses SSE.
- CHALLENGE: Create a database and Stage.
- Query staged files using the $1 syntax.
- Test different file formats with staged files to understand the file structure better.
- This unit has a lot of intensive, challenging exercises around file formats.
- They are discovery type exercises where we give them a few weird files and expect them to figure out the proper file format settings to parse the data.
- Query staged unstructured data using metadata columns.
- CHALLENGE: Write a query that estimates the sizes of files in a stage using GROUP_BY, MAX, COUNT etc.
- Turn on the directory table for a stage.
- Refresh the directory table on a stage.
- CHALLENGE: Construct a nested inline use of multiple functions.
- Intentionally create a cartesian product with a cross join.
- Create and use pre-signed URLs.
- Use Google Earth, Google Maps and WKT Playground to get familiar with points, polygons, and linestrings.
- SUPER-CHALLENGE: Db, schema, 2 stages, file formats.
- Query a staged parquet file.
- Use coordinates to build a linestring object.
- Use the LINESTRING() function.
- Use the to_geography() function.
- Use the st_xmin() function.
- Use the st_xmax() function.
- Use the st_ymin() function.
- Use the st_ymax() function.
- Use geospatial functions to create a polygon that displays as a bounding box.
- Find a listing on the marketplace and add it to a Snowflake account.
- Make use of local SQL variables in a worksheet.
- Use the ST_DISTANCE function.
- Use the ST_MAKEPOINT function.
- Write a UDF that uses a constant and variables.
- Use the ST_ASWKT function.
- Create a simple external table.
- Create a secure materialized view using an external table as the basis.
- Create an external AWS stage.
- Create an external volume for Apache Iceberg.
- Create a database that specifies Snowflake as the Catalog and references an external volume created for Apache Iceberg.
- Create an Apache iceberg table using a CTAS that pulls from a parquet file.
- Run an update statement on an Apache Iceberg table.