Data Analytics
Timespan
explore our new search
DataFrame Filtering - Learn Spark in Microsoft Fabric
Microsoft Fabric
Sep 15, 2023 4:30 PM

DataFrame Filtering - Learn Spark in Microsoft Fabric

by HubSite 365 about Will Needham (Learn Microsoft Fabric with Will)

Data Strategist & YouTuber

Data AnalyticsMicrosoft FabricLearning Selection

Learn Apache Spark in Microsoft Fabric in the 30 days of September. Here's the playlist for this series if you want to catchup: https://www.youtube.com/playlist

Day 8 - DataFrame Filtering - Learn Spark in Microsoft Fabric (8 of 30) Learn Apache Spark in Microsoft Fabric in the 30 days of September. Here's the playlist for this series if you want to catch up.

Spark is the engine behind both the Data Engineering and the Data Science experience in Microsoft Fabric, so in September I'll be walking you through Apache Spark: what it is, why you should learn it, how to use it, and how it integrates into Microsoft Fabric. No previous Spark knowledge is required, some basic Python would be useful!

Here's the schedule:

  • Welcome
  • Why Spark?
  • Components of Spark
  • Spark DataFrame
  • Read Files into DataFrame
  • Read/Write to Lakehouse Table
  • Basic DataFrame Operations
  • DataFrame Filtering - THIS VIDEO
  • GroupBy and Aggregate Functions
  • Handling missing values
  • Joining and merging DataFrames
  • Time-series
  • Spark SQL
  • MLlib Feature Engineering
  • MLlib Machine learning models
  • MLlib Model evaluation
  • Microsoft Fabric Runtime powered by Apache Spark
  • Spark Compute
  • Custom Spark pools
  • Spark Job Definitions
  • Managing Spark capacities
  • Library Management
  • Spark Scala
  • Spark R
  • Concurrency
  • Autotuning Spark
  • Fabric MSSparkUtils
  • Monitoring Spark
  • Answering your questions, FAQs
  • Continuing your Spark learning journey

Timeline:

  • Intro
  • Reading data into DataFrame
  • Equal to, not equal to
  • Startswith, Endswith
  • Multiple conditions
  • If in list
  • String contains
  • SQL %LIKE% filtering
  • Other SQL-like filtering operations
  • df.where()

Deep Dive into DataFrame Filtering

This learning series on Apache Spark in Microsoft Fabric aims at simplifying complex topics like DataFrame Filtering. This video focuses on the filtering feature. It assists how to read data into DataFrame and perform several operations such as equal to, not equal to, starts with, ends with, and multiple conditions. Using the example of 'If in list', learners can grasp how to implement 'String contains' and SQL %LIKE% filtering. It also provides insights into the df.where() function. The ultimate goal is to deepen your understanding of Microsoft Fabric and Apache Spark. No previous Spark knowledge is required, but some basic Python would come in handy.

Learn about Day 8 - DataFrame Filtering - Learn Spark in Microsoft Fabric (8 of 30)

This text is about the 8th day of a 30-day interactive learning series called 'Learn Apache Spark in Microsoft Fabric'. The specific topic for the day is 'DataFrame Filtering'. The course series aims to teach Apache Spark, it's integration with Microsoft Fabric, and its use in both Data Engineering and Data Science. The learning series requires no prior knowledge of Spark, but some basic Python skills could be beneficial. Further topics planned as part of the series are mentioned, ranging from handling missing values, Spark SQL, MLlib Model evaluation to managing Spark capacities and more. Links to other related sessions are also provided.

 

More links on about Day 8 - DataFrame Filtering - Learn Spark in Microsoft Fabric (8 of 30)

Use Apache Spark in Microsoft Fabric - Training
In this module, you'll learn how to: Configure Spark in a Microsoft Fabric workspace; Identify suitable scenarios for Spark notebooks and Spark jobs ...
data cleansing and preparation - Microsoft Fabric
Jun 12, 2023 — In this tutorial, we demonstrate how to use Apache Spark notebooks to clean and prepare the taxi trips dataset. Spark's optimized distribution ...
Apache Spark runtime in Fabric
May 23, 2023 — The Microsoft Fabric Runtime is an Azure-integrated platform based on Apache Spark that enables the execution and management of data ...
Analyze data with Apache Spark and Python
May 23, 2023 — Create a Spark DataFrame by retrieving the data via the Open Datasets API. Here, we use the Spark DataFrame schema on read properties to infer ...
How to train models with Apache Spark MLlib
May 23, 2023 — Run the following lines to create a Spark DataFrame by pasting the code into a new cell. This step retrieves the data via the Open Datasets API.
Filtering a spark dataframe based on date
Aug 13, 2015 — If your DataFrame date column is of type StringType , you can convert it using the to_date function : // filter data where the date is ...
Fabric end-to-end use case: Data Engineering part 1
Aug 28, 2023 — In this series, we will explore how to use Microsoft Fabric to ingest, transform, and analyze data using a real-world use case.
Learn Spark in Microsoft Fabric in September (2 of 30)
كيف DAY TWO – Why Spark? – Learn Spark in Microsoft Fabric in September (2 of 30). Learn Apache Spark in Microsoft Fabric in the 30 days of September. Here's ...
How to calculate MOVING AVERAGE in a Pandas ...
Jun 15, 2022 — In Python, we can calculate CMA using .expanding() method. Now we will see an example, to calculate CMA for a period of 30 days. Step 1: ...

Keywords

Microsoft Fabric expertise, Learn Apache Spark in Microsoft Fabric, Microsoft Fabric data engineering, PYSPARK in Microsoft Fabric, Microsoft Fabric Apache Spark integration