Introduction
UNIX timestamps are commonly used in computing. They usually represent the number of seconds that have elapsed since epoch which is January 1, 1970. In this article, we show you how to work with this unique data type in polars. If you’d prefer a video version, we’ve embedded that here.
Importing Polars
Let’s begin by importing polars.
import polars as pl
Loading Data
Next, we need to load our dataset. We have a CSV file that contains UNIX timestamps so let’s go ahead and read this file into our notebook here.
data = pl.read_csv("../../Datasets/unix_timestamps.csv")
data
And you can see that we now have some UNIX timestamps here represented as integers. Now these particular timestamps are represented in seconds and that will be a crucial piece of information to know going forward.
Understanding Unix Timestamps
Okay, Now that we have our UNIX timestamps ready to go, we can start converting them into a more human readable format.
To do that we’re going to use the from_epoch
function. Now I’ve been pretty active in the Polars discord server and I saw a discussion that became the inspiration for this video.
user djduque
had been converting unix timestamps in pandas and he wanted to know how to do the same thing in Polars.
following the conversation, I saw that user etrotta
gave the brilliant advice to use the from_epoch
function.
etrotta
also made a spot on observation that from_epoch
is not a member of the temporal expression section in the polars library, but it should be. I couldn’t agree more, and perhaps the Polars dev team will consider moving the expression in version 2.0.
For most of you watching this video the location of the function does not matter so lets move on and actually dive into how to use from_epoch
.
The first thing you should know is what time unit your epoch timestamp is in. The most common is seconds and that is what we have in our dataset. But you can also have epoch timestamps in other time units as well. This table here shows all of the supported time units and how they translate to Temporal Polars Data Types.
Again the most common is in seconds so if you choose not to specify one, that is what will be used.
Converting to Epoch Format
With that said, let’s convert our UNIX timestamps.
df = data.with_columns(pl.from_epoch(column='UNIX Timestamp',time_unit='s').alias('Epoch'))
df
Here’s the updated DataFrame that now includes both the original UNIX timestamps and their corresponding epoch dates. And This allows us to easily read and interpret the dates.
Exploring Different Epoch Formats
Now it wouldn’t be a true conversion tool if you couldn’t change the data back to the original format. And polars does provide the ability to do this using the epoch
expression. epoch
is a member of the temporal expressions section and it does the exact opposite of from_epoch
.
I hate to keep harping on this, but in addition to moving from_epoch
to the temporal expressions section, I think it would make a lot more sense to rename epoch
as to_epoch
. This naming convention using to
and from
would help clarify the functionalities. But again I digress because we simply want to see what the functionality can do.
We mentioned that UNIX timestamps can be represented in different time units. Let’s use the epoch
expression to display our timestamps in some different units.
df.with_columns(pl.col("Epoch").dt.epoch(time_unit="d").alias("epoch_date"),
pl.col("Epoch").dt.epoch(time_unit="s").alias("epoch_seconds"),
pl.col("Epoch").dt.epoch(time_unit="ms").alias("epoch_milliseconds"),
pl.col("Epoch").dt.epoch(time_unit="us").alias("epoch_microseconds"),
pl.col("Epoch").dt.epoch(time_unit="ns").alias("epoch_nanoseconds")
)
In the output, we see the different versions of our timestamps. Notice how the epoch_seconds column aligns with our original column. The further you go down the line, you’ll see greater and greater precision. The epoch_date
column is the least precise and even loses some precision since it is representing the number of days that have elapsed since epoch instead of seconds.
Conclusion
Epoch Timestamps solve a number of problems including: precision and storage efficiency. You might see them in log files, API class, or even within your databases. Knowing how to work with this data is extremely useful and now you know how with Polars.