Handling Epoch Timestamps

Introduction

UNIX timestamps are commonly used in computing. They usually represent the number of seconds that have elapsed since epoch which is January 1, 1970. In this article, we show you how to work with this unique data type in polars. If you’d prefer a video version, we’ve embedded that here.

Importing Polars

Let’s begin by importing polars.

import polars as pl

Loading Data

Next, we need to load our dataset. We have a CSV file that contains UNIX timestamps so let’s go ahead and read this file into our notebook here.

data = pl.read_csv("../../Datasets/unix_timestamps.csv")
data

And you can see that we now have some UNIX timestamps here represented as integers. Now these particular timestamps are represented in seconds and that will be a crucial piece of information to know going forward.

Understanding Unix Timestamps

Okay, Now that we have our UNIX timestamps ready to go, we can start converting them into a more human readable format.

To do that we’re going to use the from_epoch function. Now I’ve been pretty active in the Polars discord server and I saw a discussion that became the inspiration for this video.

user djduque had been converting unix timestamps in pandas and he wanted to know how to do the same thing in Polars.

following the conversation, I saw that user etrotta gave the brilliant advice to use the from_epoch function.

etrotta also made a spot on observation that from_epoch is not a member of the temporal expression section in the polars library, but it should be. I couldn’t agree more, and perhaps the Polars dev team will consider moving the expression in version 2.0.

For most of you watching this video the location of the function does not matter so lets move on and actually dive into how to use from_epoch.

The first thing you should know is what time unit your epoch timestamp is in. The most common is seconds and that is what we have in our dataset. But you can also have epoch timestamps in other time units as well. This table here shows all of the supported time units and how they translate to Temporal Polars Data Types.

Again the most common is in seconds so if you choose not to specify one, that is what will be used.

Converting to Epoch Format

With that said, let’s convert our UNIX timestamps.

df = data.with_columns(pl.from_epoch(column='UNIX Timestamp',time_unit='s').alias('Epoch'))
df

Here’s the updated DataFrame that now includes both the original UNIX timestamps and their corresponding epoch dates. And This allows us to easily read and interpret the dates.

Exploring Different Epoch Formats

Now it wouldn’t be a true conversion tool if you couldn’t change the data back to the original format. And polars does provide the ability to do this using the epoch expression. epoch is a member of the temporal expressions section and it does the exact opposite of from_epoch.

I hate to keep harping on this, but in addition to moving from_epoch to the temporal expressions section, I think it would make a lot more sense to rename epoch as to_epoch. This naming convention using to and from would help clarify the functionalities. But again I digress because we simply want to see what the functionality can do.

We mentioned that UNIX timestamps can be represented in different time units. Let’s use the epoch expression to display our timestamps in some different units.

df.with_columns(pl.col("Epoch").dt.epoch(time_unit="d").alias("epoch_date"),
                pl.col("Epoch").dt.epoch(time_unit="s").alias("epoch_seconds"),
                pl.col("Epoch").dt.epoch(time_unit="ms").alias("epoch_milliseconds"),
                pl.col("Epoch").dt.epoch(time_unit="us").alias("epoch_microseconds"),
                pl.col("Epoch").dt.epoch(time_unit="ns").alias("epoch_nanoseconds")
                )

In the output, we see the different versions of our timestamps. Notice how the epoch_seconds column aligns with our original column. The further you go down the line, you’ll see greater and greater precision. The epoch_date column is the least precise and even loses some precision since it is representing the number of days that have elapsed since epoch instead of seconds.

Conclusion

Epoch Timestamps solve a number of problems including: precision and storage efficiency. You might see them in log files, API class, or even within your databases. Knowing how to work with this data is extremely useful and now you know how with Polars.

Review Your Cart
0
Add Coupon Code
Subtotal