Data Analysis with Polars in Python

Instructor

Jayden Rasband

Reviews 0 (0 Reviews)

Course Overview

Over the last decade Python has become more utilized in Data Pipelines. However, most pipelines faced performance issues when processing large datasets in Python. This limitation hindered Python’s ability to manage “Big Data”.

But in recent years, Polars unlocked the door to processing large datasets with its high performance data structures. It uses parallel processing to quickly read data into DataFrames and Series.

And its performance doesn’t stop there! Not only can Polars read and write data quickly, it can also manipulate vast amounts data faster than Pandas.

FAQs

Q: Is the switch from Pandas difficult?

A: No. The basic concepts are the same. There are definitely differences between the two libraries, but functionality between the two are very similar. If you can do it in Pandas, you can do it in Polars!

Q: I’m already learning Pandas, would you say I’m wasting my time?

A: No. My first exposure to DataFrames was using Pandas. Many of the concepts I learned in Pandas helped me understand Polars. They are definitely different in terms of performance. Pandas may at some point release a faster version, but as for now Polars is much faster when working with large datasets.

Q: Pandas has integrations with many more libraries than Polars. Won’t I be missing out on these if I make the switch?

A: Absolutely not. Its true that Polars does not have as many integrations with other python libraries, but switching from a polars DataFrame to a Pandas DataFrame is easy. Polars has a function that allows you to convert to and from a Pandas DataFrame. This allows you to get the performance of Polars while also getting the integrations of Pandas. Other libraries have also begun to build integrations with Polars so that may change altogether.

Q: What kind of bear is best?

A: There are basically two schools of thought… Pandas and Polars are indeed competing DataFrame libraries. Its probably for you to decide the answer to this question!

What You'll Learn?

Read CSV files into Polars DataFrames
Know how to push data directly from Polars into a database
Export DataFrames to Excel
Aggregate complex datasets
Join DataFrames together
Utilize Polars' superior processing speed

Targeted Audience
Materials Included

Beginner Data Engineers looking to improve data manipulation skills
Aspiring Data Analysts seeking to learn data discovery practices
Data Engineers looking to utilize polars in their data pipelines
Pandas users looking to make the switch to Polars

Practice Datasets
Jupyter Notebooks

Course Content

Getting Started
- Welcome to the Course!
- Installing Python
- Installing Visual Studio Code
- Extensions for Visual Studio Code
- Course Materials
DataFrame Inputs & Outputs
- Installing Polars
- Read CSV
- Write CSV - Part 1
- Write CSV - Part 2
- Write Database
- Read Database
- Read Excel
- From Pandas
- Read ODS
- JSON Normalize
- Scan CSV
- Reading Multiple CSV Files
- Write Database (SQL Server)
- Schemas
- LazyFrame Query Optimization
- To Pandas
Selecting From DataFrames
- Select
- Slicing & Sampling
- Filter
- Frame SQL
Joining & Appending to DataFrames
- Inner Joins
- Anti Joins
- Left Joins
Aggregating DataFrames
- Min & Max
- Mean, Median, & Mode
- Quantiles
- Sum
- Rank
Date & Time Functions
- Business Day Count
- Add Business Days
- Handling Time Zones
String Expressions
- Regular Expressions (RegEx)
- Find & Replace
- strptime

Student Feedback

0.0

Total 0 Ratings

5

0(0%)

4

0(0%)

3

0(0%)

2

0(0%)

1

0(0%)

Data Analysis with Polars in Python

Jayden Rasband

Course Overview

FAQs

What You'll Learn?

Course Content

Welcome to the Course!

Installing Python

Installing Visual Studio Code

Extensions for Visual Studio Code

Course Materials

Installing Polars

Read CSV

Write CSV - Part 1

Write CSV - Part 2

Write Database

Read Database

Read Excel

From Pandas

Read ODS

JSON Normalize

Scan CSV

Reading Multiple CSV Files

Write Database (SQL Server)

Schemas

LazyFrame Query Optimization

To Pandas

Select

Slicing & Sampling

Filter

Frame SQL

Inner Joins

Anti Joins

Left Joins

Min & Max

Mean, Median, & Mode

Quantiles

Sum

Rank

Business Day Count

Add Business Days

Handling Time Zones

Regular Expressions (RegEx)

Find & Replace

strptime

Student Feedback