Renaming columns in Polars is a common task. In any given data pipeline, you may need to format the columns with certain case conventions or make the column names more user-friendly. We’ll walk through various ways that you can rename columns.
Code Along
We provide Jupyter notebooks for you to easily learn and get hands on experience with various polars functions. If you want to follow along, download the notebooks from our GitHub repo: https://github.com/CodeCrewCareers/polarscodeacademy
Establishing a Baseline
many cases, it’s helpful to rename columns as you read your data into polars. There are a few ways to do this, and we’ll walk you through two methods here in this how-to.
We’ve got an employees dataset here. To establish a baseline, we’ll do a quick import and display the data.
import polars as pl
data = pl.read_csv('../datasets/employees.csv')
data.head(1)
shape: (1, 8)
┌────────────┬────────────┬───────────┬────────────┬────────────┬────────────┬────────────┬────────┐
│ employee_i ┆ first_name ┆ last_name ┆ email ┆ job_title ┆ date_of_bi ┆ date_of_hi ┆ salary │
│ d ┆ --- ┆ --- ┆ --- ┆ --- ┆ rth ┆ re ┆ --- │
│ --- ┆ str ┆ str ┆ str ┆ str ┆ --- ┆ --- ┆ i64 │
│ i64 ┆ ┆ ┆ ┆ ┆ str ┆ str ┆ │
╞════════════╪════════════╪═══════════╪════════════╪════════════╪════════════╪════════════╪════════╡
│ 1 ┆ john ┆ doe ┆ john.doe@e ┆ software_e ┆ 1990-05-12 ┆ 2015-08-01 ┆ 85000 │
│ ┆ ┆ ┆ xample.com ┆ ngineer ┆ ┆ ┆ │
└────────────┴────────────┴───────────┴────────────┴────────────┴────────────┴────────────┴────────┘
Take note of the column names. We’ll be altering them in our next code block.
Rename on Import
The first method we’ll use is the new_columns
parameter in the read_csv
function. Within the parameter, we’ll pass our new list of column names. One thing you need to make sure of when you use this parameter is to match the number of column headers in the csv file.
data = pl.read_csv('../datasets/employees.csv'
,new_columns=['ID'
,'FirstName'
,'LastName'
,'Email'
,'Job'
,'DOB'
,'HireDate'
,'Salary'])
data.head(1)
shape: (1, 8)
┌─────┬───────────┬──────────┬─────────────────┬────────────────┬────────────┬────────────┬────────┐
│ ID ┆ FirstName ┆ LastName ┆ Email ┆ Job ┆ DOB ┆ HireDate ┆ Salary │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ i64 │
╞═════╪═══════════╪══════════╪═════════════════╪════════════════╪════════════╪════════════╪════════╡
│ 1 ┆ john ┆ doe ┆ john.doe@exampl ┆ software_engin ┆ 1990-05-12 ┆ 2015-08-01 ┆ 85000 │
│ ┆ ┆ ┆ e.com ┆ eer ┆ ┆ ┆ │
└─────┴───────────┴──────────┴─────────────────┴────────────────┴────────────┴────────────┴────────┘
You’ll note now that the column names have been updated!
Rename Via Schema
Another way to rename columns on import is to leverage the schema parameter. This option utilizes the polars schema object, which can be viewed and edited independently of a DataFrame. All DataFrames have them as part of the DataFrame object, but you can also create one from scratch.
Now, rather than write out a full schema, you can save and load schemas with the polars-extensions library. If you interested in learning more about that we do have a full video where we go into how to leverage that capability. You can find it here.
Now we do have a schema that we have saved. We first need to import the schema with the polars extensions library.
import polars_extensions as plx
schema = plx.read_schema('schema.json')
schema
Schema([('Id', Int64),
('FiRsTnAmE', String),
('LaStNaMe', String),
('EmAiL', String),
('JoBtItLe', String),
('DoB', String),
('HiReDaTe', String),
('SaLaRy', Int64)])
Now that we have our schema, we can use it to overwrite our column names when importing our employees dataset.
data = pl.read_csv('../datasets/employees.csv',schema=schema)
data.head(1)
shape: (1, 8)
┌─────┬───────────┬──────────┬─────────────────┬────────────────┬────────────┬────────────┬────────┐
│ Id ┆ FiRsTnAmE ┆ LaStNaMe ┆ EmAiL ┆ JoBtItLe ┆ DoB ┆ HiReDaTe ┆ SaLaRy │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ i64 │
╞═════╪═══════════╪══════════╪═════════════════╪════════════════╪════════════╪════════════╪════════╡
│ 1 ┆ john ┆ doe ┆ john.doe@exampl ┆ software_engin ┆ 1990-05-12 ┆ 2015-08-01 ┆ 85000 │
│ ┆ ┆ ┆ e.com ┆ eer ┆ ┆ ┆ │
└─────┴───────────┴──────────┴─────────────────┴────────────────┴────────────┴────────────┴────────┘
In the output, we get our updated columns.
Rename via List
The next two methods we’ll use relate to renaming columns after the data is already in polars. The first is to rename via a list. This method is fairly straight forward. Simply use the following code:
data.columns = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H']
data.head(1)
shape: (1, 8)
┌─────┬──────┬─────┬──────────────────────┬───────────────────┬────────────┬────────────┬───────┐
│ A ┆ B ┆ C ┆ D ┆ E ┆ F ┆ G ┆ H │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ i64 │
╞═════╪══════╪═════╪══════════════════════╪═══════════════════╪════════════╪════════════╪═══════╡
│ 1 ┆ john ┆ doe ┆ john.doe@example.com ┆ software_engineer ┆ 1990-05-12 ┆ 2015-08-01 ┆ 85000 │
└─────┴──────┴─────┴──────────────────────┴───────────────────┴────────────┴────────────┴───────┘
Rename via Dictionary
The second method, post data load, is to use the rename
function. This function requires us to pass a dictionary that explicitly maps. In our case, we have the leftover ABC column names from our last run. We’ll remap these back to some more user-friendly names.
data.rename(mapping={'A': 'Emp_ID'
,'B': 'First_Name'
, 'C': 'Last_Name'
, 'D': 'Email'
, 'E': 'Job_Role'
, 'F': 'Date_Of_Birth'
, 'G': 'Hire_Date'
, 'H': 'Salary'}).head(1)
shape: (1, 8)
┌────────┬────────────┬───────────┬──────────────┬─────────────┬─────────────┬────────────┬────────┐
│ Emp_ID ┆ First_Name ┆ Last_Name ┆ Email ┆ Job_Role ┆ Date_Of_Bir ┆ Hire_Date ┆ Salary │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ th ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ --- ┆ str ┆ i64 │
│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ │
╞════════╪════════════╪═══════════╪══════════════╪═════════════╪═════════════╪════════════╪════════╡
│ 1 ┆ john ┆ doe ┆ john.doe@exa ┆ software_en ┆ 1990-05-12 ┆ 2015-08-01 ┆ 85000 │
│ ┆ ┆ ┆ mple.com ┆ gineer ┆ ┆ ┆ │
└────────┴────────────┴───────────┴──────────────┴─────────────┴─────────────┴────────────┴────────┘
Once again, we have our new columns!
Conclusion
There are so many ways to rename columns in Polars. We’ve given you insight into some of the best ways to do it. We hope it helps you manage your data better. Want to learn more? We recommend you check out the resources below.