Unpacking Python’s Built in JSON Library

Introduction

JavaScript Object Notation (JSON) is a data format that’s used to transport data across the web. Python has built in data structures and types that are very similar to JSON’s, but in order for us to interact with JSON data in python we need to deserialize or convert it to acceptable python equivalents. We can do this by using the built in json library.

Getting the Package

Official Documentation: https://docs.python.org/3/library/json.html

No installation necessary. Package comes preloaded with Python distribution.

What’s Inside?

Classes
- JSONEncoder()
- JSONDecoder()
Functions
- dump()
- dumps()
- load()
- loads()

The Conversion Matrix

There are two main data structures in JSON: Objects and Arrays. When converting JSON to Python, these structures get converted to dictionaries and lists, respectively.

In addition to data structure changes, JSON data types are also converted to python data types. For example, JSON number data types are converted to python’s integer data type.

The json library enables us to easily convert back and forth between JSON and Python. To understand how this conversion is done, I’ve included a conversion chart below.

JSON	Direction	Python
object `{"key":"value"}`	<–>	dict `{'key':'value'}`
array `[100,200,300]`	<–>	list `[100,200,300]`
array `["Hello","There"]`	<–	tuple `('Hello','There')`
string `"Hello There!"`	<–>	str `'Hello There!'`
number `15`	<–>	int `15`
number `(with decimals) 6.99`	<–>	float `6.99`
`true`	<–>	`True`
`false`	<–>	`False`
`null`	<–>	`None`
`Infinity`	<–>	`inf`
`-Infinity`	<–>	`-inf`
`Nan`	<–>	`nan`

Converting from JSON to Python

Within the json library, there are two methods for converting JSON to Python. One enables you to deserialize JSON from a file and the other to deserialize JSON strings. These are named load() and loads() respectively. An easy way to remember the difference between these two is to note the s on the second function indicates a string. Check out how to use each of these below!

json.loads()

Allows you to load JSON strings.

json.loads(s
           ,*
           ,cls=None
           ,object_hook=None
           ,parse_float=None
           ,parse_int=None
           ,parse_constant=None
           ,object_pairs_hook=None
           ,**kw
)

Parameters

s: string containing valid json.
cls (optional): allows you to use a custom JSONDecoder class.
object_hook (optional): allows you to place JSON data into a custom object.
parse_float (optional): allows you to customize how floats are handled when decoding.
parse_int (optional): allows you to customize how integer values are handled when decoding.
parse_constant (optional): allows you to customize how constant strings values (null, true, false) are handled when decoding.
object_pairs_hook (optional): allows you to customize how key-value pairs are returned. Overrides the default (dict).

Standard Usage

For most python users, the standard usage is adequate.

import json

json_string ='{"name": "John", "age": 30, "city": "New York"}'
json.loads(json_string)

Output

{'name': 'John', 'age': 30, 'city': 'New York'}

Cls

The cls parameter allows us to pass a custom JSONDecoder class object. These user defined JSONDecoders allow you create your own JSON parsers. This is definitely for more advanced users. We’ll address this parameter later after we show you how to make a JSONDecoder.

Object Hook

Object Hook allows you to take JSON data and directly insert it into custom Python Objects.

Let’s say that we have a ContactInfo class that we want to pass our JSON data into. We’ve defined this class already in the code below. We need to also define a object hook function. We’ll call our function to_contact and then write the logic that will puts everything where it needs to go.

import json

class ContactInfo:
    def __init__(self, name, phone_number, email):
        self.name = name
        self.phone_number = phone_number
        self.email = email

    def __str__(self):
        return f"Name: {self.name}\nPhone Number: {self.phone_number}\nEmail: {self.email}"


def to_contact(obj_dict):
    if 'name' in obj_dict and 'phone_number' in obj_dict and 'email' in obj_dict:
        return ContactInfo(obj_dict['name'], obj_dict['phone_number'], obj_dict['email'])
    return obj_dict

json_string = '{"name": "John Doe", "phone_number": "123-456-7890", "email": "john@example.com"}'
contact = json.loads(json_string, object_hook=to_contact)
print(contact)

Parse Float

Let’s walk through an example of how to use the parse_float parameter. Let’s say that we want our deserialized JSON data to round decimal values to the nearest hundredth.

We’ll use a lambda function to take all identified decimal values and round them. This function will take our long form version of Pi in our JSON data and round it to the nearest hundredth decimal.

import json

json_string = '{"pi": 3.1415, "large_number": 123456789}'
data = json.loads(json_string, parse_float=lambda x: round(float(x), 2))
print(data)

Parse Int

Parse Int allows us to manipulate integer values as we are decoding them into python.

Let’s say that we have some number values that are represented in thousands. When we read these numbers into Python, we want to show them in their “natural” form. To do this we can use the parse_int parameter.

import json

  
def to_thousands(string):
    return int(string) * 1000

json_string = '{"number": 10, "representation":"Thousands"}'
parsed_data = json.loads(json_string, parse_int=to_thousands)
print(parsed_data)

Parse Constant

Parse Constant allows you to deal with special JSON Values including:

Infinity
-Infinity
Nan

By default these values are automatically converted to their python equivalents (see conversion matrix above). That said, we can change how these are read into python using the parse_constant parameter.

import json

def custom_parse_constant(constant):
    if constant == '-Infinity':
        return 'missing'
    elif constant == 'Infinity':
        return 'missing'
    elif constant == 'NaN':
        return 'missing'
    else:
        return constant

json_string = '{"value1": Infinity, "value2": -Infinity, "value3": NaN}'
data = json.loads(json_string, parse_constant=custom_parse_constant)
print(data)

Object Pairs Hook

This parameter allows you to override the default return value (dict) with whatever has been specified in the “hook function”.

For example, we can use object_pairs_hook to create a dictionary where the keys are all uppercase.

import json


def uppercase_keys(pairs):
    return {key.upper(): value for key, value in pairs} 

json_string = '{"name": "john", "age": 30, "city": "new york"}'
data = json.loads(json_string, object_pairs_hook=uppercase_keys)
print(data)

json.load()

Allows you to load JSON from a file. It contains all of the same parameters as the loads function with the exception of the fp parameter instead of a string. As a result we will only be reviewing “standard usage”.

json.load(fp
          ,*
          ,cls=None
          ,object_hook=None
          ,parse_float=None
          ,parse_int=None
          ,parse_constant=None
          ,object_pairs_hook=None
          ,**kw)

Parameters

fp: file path
cls (optional): allows you to use a custom JSONDecoder class.
object_hook (optional): allows you to place JSON data into a custom object.
parse_float (optional): allows you to customize how floats are handled when decoding.
parse_int (optional): allows you to customize how integer values are handled when decoding.
parse_constant (optional): allows you to customize how constant strings values (null, true, false) are handled when decoding.
object_pairs_hook (optional): allows you to customize how key-value pairs are returned. Overrides the default (dict).

Standard Usage

The name of the fp parameter is a bit misleading. It doesn’t just want a file path. It actually wants you to open a file for it to read the data from. Let’s demonstrate how to do this assuming you have the following data.json file:

{
    "name": "John Doe",
    "age": 30,
    "city": "New York",
    "is_student": false,
    "grades": [90, 85, 88]
}

To open this file, you would then run the following code:

import json


with open('data.json', 'r') as json_file:
    data = json.load(json_file)

print(data)

Converting from Python to JSON

json.dumps()

Enables you to convert Python data structures to JSON formatted string.

json.dumps(obj
           ,*
           ,skipkeys=False
           ,ensure_ascii=True
           ,check_circular=True
           ,allow_nan=True
           ,cls=None
           ,indent=None
           ,separators=None
           ,default=None
           ,sort_keys=False
           ,**kw)

Parameters

obj: A serializable python object such as a dictionary, list, or tuple.
skipkeys (optional): allows you to skip serializing keys that are not basic types (str, int, float, bool, None).
ensure_ascii (optional): allows you to escape non-ASCII characters in the output with uXXXX sequences.
check_circular (optional): allows you to not check for circular references while serializing the object.
allow_nan (optional): enables you to allow NaN, Infinity, and -Infinity as JSON number values. Default is True.
cls (optional): allows you to customize the serialization process with a JSONEncoder.
indent (optional): allows you to specify the indentation level to use when pretty-printing the JSON output.
separators (optional): allows you to specify custom separators to use for the JSON output. Passed in a Tuple format. It should be a tuple of two strings: the first one representing the separator between keys and values, and the second one representing the separator between items in a dictionary. Default is (',', ':').
default (optional): allows you to pass a function that will be used for objects that are not serializable by default. Default is None.
sort_keys (optional): allows you to specify whether to sort the keys of dictionaries alphabetically before serializing. Default is False.

Standard Usage

import json

data = {
    "name": "John Doe",
    "age": 30,
    "city": "New York",
    "is_student": False,
    "grades": [90, 85, 88]
}

json_string = json.dumps(data)
print(json_string)

Skip Keys

There may be times when your dictionary or list has a strange key. By strange, I mean that it does not conform to JSON’s standard and therefore would create an invalid JSON string when converted. To avoid this python will throw an error if it finds keys in your dictionary that are not str, int, float, bool or None.

To avoid this error and still maintain a valid JSON string, you can skip these problematic keys. For our example, let’s say that we have a tuple as one of our keys. Since it isn’t one of the accept key data types, it will throw an error unless we call the skipkeys parameter and pass True.

import json


data = {
    (1,2,3): "value1",
    "key2": "value2"
}

# json_str_without_skipkeys = json.dumps(data)
# print(json_str_without_skipkeys)

json_str_with_skipkeys = json.dumps(data, skipkeys=True)
print(json_str_with_skipkeys)

Ensure ASCII

By Default, JSON is encoded in UTF-8. This means that it supports a wide range of international characters. There are some rare cases where you may need to encode the JSON string in an ASCII acceptable format. To do this you can use the ensure_ascii parameter.

import json


data = {
    "name": "José",
    "city": "São Paulo",
    "country": "日本"
}

json_str_ensure_ascii_true = json.dumps(data, ensure_ascii=True)
print(json_str_ensure_ascii_true)

json_str_ensure_ascii_false = json.dumps(data, ensure_ascii=False)
print(json_str_ensure_ascii_false)

Check Circular

There is never a circumstance where you would want to change the check_circular parameter. By nature, circular references would run forever. Even if you did set this parameter to False, you would still get a RecursionError. The important thing to know with this parameter is that it makes it so your code doesn’t run endlessly.

Allow NaN

We’ve briefly discussed special JSON values NaN, Infinity, and -Infinity. The allow_nan parameter gives you control on whether these special values get encoded or not. One of the most common use cases is for data validation.

Let’s say that we have a system that we are sending some JSON data to, but it can’t accept special JSON values. On exporting the data we want to catch these issues so that the system doesn’t run into issues loading our data. We can do that by setting allow_nan equal to False.

import json


data = {
    'Name': float('nan'),
    'Price': float('inf'),
    'CostOfGoods': float('-inf')
}

json_str_allow_nan_true = json.dumps(data, allow_nan=True)
print(json_str_allow_nan_true)

json_str_allow_nan_true = json.dumps(data, allow_nan=False)
print(json_str_allow_nan_true)

cls

The cls parameter allows us to pass a custom JSONEncoder class object. These user defined JSONEncoders allow you create your own JSON serialization process. This is definitely for more advanced users. We’ll address this parameter later after we show you how to make a JSONEncoder.

Indent

This parameter is particularly useful when you want to make json more readable. You can “pretty-print” the output by calling the indent function. Keep in mind that this adds more whitespace and increases the file size. For cases where you want to transport data to another application, its best to keep the default setting.

import json


data = {
    "name": "John",
    "age": 30,
    "city": "New York",
    "interests": ["music", "sports", "reading"],
    "is_student": True,
    "grades": {"math": 90, "science": 85, "history": 88}
}

json_str_default_indent = json.dumps(data)
print(json_str_default_indent)

json_str_indent_4 = json.dumps(data, indent=4)
print(json_str_indent_4)

Separator

When encoding data into JSON format, we have some flexibility with the separators used in both Objects and Arrays. By default Arrays use commas to separate items and Objects use colons to separate key value pairs. We can overwrite this by calling the separators parameter and passing a tuple with the new separators in a tuple format.

You might be wondering why someone might want to do this. The only reason it really makes sense is if another system uses a custom parser. For most purposes, the standard separators is just fine.

import json


data = {
    "name": "John",
    "age": 30,
    "city": "New York",
    "interests": ["music", "sports", "reading"],
    "is_student": True,
    "grades": {"math": 90, "science": 85, "history": 88}
}


json_str_default_separators = json.dumps(data)
print(json_str_default_separators)

json_str_custom_separators = json.dumps(data, separators=(',', ' => '))
print(json_str_custom_separators)

Default

There are times where you may want to serialize objects that simply cannot be by default. Remember our custom ContactInfo class from earlier? This is the perfect example for utilizing the default parameter. We simply need to define a new function that will convert the object to a dictionary and then we can pass it in.

import json

class ContactInfo:
    def __init__(self, name, phone_number, email):
        self.name = name
        self.phone_number = phone_number
        self.email = email

    def __str__(self):
        return f"Name: {self.name}\nPhone Number: {self.phone_number}\nEmail: {self.email}"

def contact_to_dict(contact):
    return {
        'name': contact.name,
        'phone_number': contact.phone_number,
        'email': contact.email
    }

contact = ContactInfo("John Doe", "123-456-7890", "john@example.com")

json_string = json.dumps(contact, default=contact_to_dict)
print(json_string)

This is perfect for on the fly serialization. That said, a better way to accomplish this same task is to simply create a class method that allows you to export the object as JSON.

import json

class ContactInfo:
    def __init__(self, name, phone_number, email):
        self.name = name
        self.phone_number = phone_number
        self.email = email

    def __str__(self):
        return f"Name: {self.name}\nPhone Number: {self.phone_number}\nEmail: {self.email}"
    
    def to_json(self):
        return json.dumps({
            'name': self.name,
            'phone_number': self.phone_number,
            'email': self.email
        })

contact = ContactInfo("John Doe", "123-456-7890", "john@example.com")

json_string = contact.to_json()
print(json_string)

Sort Keys

As the name implies, sort keys allows you to sort the keys when serializing your JSON data. To sort the keys and override the default, simply pass True into the sort_keys parameter.

import json

data = {
    "name": "John",
    "age": 30,
    "city": "New York",
    "interests": ["music", "sports", "reading"],
    "is_student": True,
    "grades": {"math": 90, "science": 85, "history": 88}
}

json_str_unsorted_keys = json.dumps(data, sort_keys=False)
print( json_str_unsorted_keys)

json_str_sorted_keys = json.dumps(data, sort_keys=True)
print(json_str_sorted_keys)

json.dump()

Enables you to convert Python data structures to JSON formatted file. It contains all of the same parameters as the dumps function with the exception of the additional fp parameter. As a result we will only be reviewing “standard usage”.

json.dump(obj
          ,fp
          ,*
          ,skipkeys=False
          ,ensure_ascii=True
          ,check_circular=True
          ,allow_nan=True
          ,cls=None
          ,indent=None
          ,separators=None
          ,default=None
          ,sort_keys=False
          ,**kw)

Parameters

obj: A serializable python object such as a dictionary, list, or tuple.
fp: File Path
skipkeys (optional): allows you to skip serializing keys that are not basic types (str, int, float, bool, None).
ensure_ascii (optional): allows you to escape non-ASCII characters in the output with uXXXX sequences.
check_circular (optional): allows you to not check for circular references while serializing the object.
allow_nan (optional): enables you to allow NaN, Infinity, and -Infinity as JSON number values. Default is True.
cls (optional): allows you to customize the serialization process with a JSONEncoder.
indent (optional): allows you to specify the indentation level to use when pretty-printing the JSON output.
separators (optional): allows you to specify custom separators to use for the JSON output. Passed in a Tuple format. It should be a tuple of two strings: the first one representing the separator between keys and values, and the second one representing the separator between items in a dictionary. Default is (',', ':').
default (optional): allows you to pass a function that will be used for objects that are not serializable by default. Default is None.
sort_keys (optional): allows you to specify whether to sort the keys of dictionaries alphabetically before serializing. Default is False.

Standard Usage

The name of the fp parameter is a bit misleading. It doesn’t just want a file path. It actually wants you to open a file for it to read the data into.

import json

data = {
    "name": "John Doe",
    "age": 30,
    "city": "New York",
    "is_student": False,
    "grades": [90, 85, 88]
}

with open('data.json', 'w') as json_file:
    json.dump(data, json_file)

Creating a Custom JSONDecoder

The json library gives you the flexibility to create custom JSON decoders. Let’s show you how to create one now!

We’ll create a new class called CustomDecoder. This class will inherit the JSONDecoder class. JSONDecoder has a method called decode so we will be using that, but we will also be redefining what that function does within our CustomDecoder.

We’re going to tell our new decoder to make all string values in our Objects uppercase. And we’ll do the same for all items in our Arrays.

import json

class CustomDecoder(json.JSONDecoder):
    def decode(self, s):
        decoded_obj = super().decode(s)
    
        if isinstance(decoded_obj, dict):
            return {k: v.upper() if isinstance(v, str) else v for k, v in decoded_obj.items()}
        elif isinstance(decoded_obj, list):
            return [item.upper() if isinstance(item, str) else item for item in decoded_obj]
        else:
            return decoded_obj

Now that we’ve successfully created the Customer decoder, we can utilize it on some sample data.

json_data = '{"name": "john", "age": 30, "city": "new york"}'
decoder = CustomDecoder()
decoded_data = decoder.decode(json_data)
print(decoded_data)

OUTPUT:

{'name': 'JOHN', 'age': 30, 'city': 'NEW YORK'}

When you run the code, you should now see a Python dictionary with all the string values in an uppercase format.

Creating a Custom JSONEncoder

Now that we’ve create a custom decoder, lets walk through what you can do with JSONEncoder. Sets are a data structure in python that are not normally JSON serializable. We could change this, by writing a custom encoder.

We simply need to create a new class and tell python to convert any set it finds to a list.

import json

class CustomJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        elif hasattr(obj, '__dict__'):
            return obj.__dict__
        else:
            return json.JSONEncoder.default(self, obj)

Let’s test out our new encoder on some data containing a set.

data = {
    'name': 'John',
    'age': 30,
    'interests': {'coding', 'reading', 'traveling'},
    'address': {
        'street': '123 Main St',
        'city': 'Anytown',
        'country': 'USA'
    }
}

encoded_data = json.dumps(data, cls=CustomJSONEncoder,indent=4)
print(encoded_data)

OUTPUT:

{
    "name": "John",
    "age": 30,
    "interests": [
        "reading",
        "coding",
        "traveling"
    ],
    "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "country": "USA"
    }
}

When you run the code, you should get the output above. Notice how the set is now a list.

Conclusion

The built in JSON library is extremely useful for working with JSON data. With its robust functionality and ability to customize how you encode and decode JSON, you can accomplish a lot. If you’re interested in learning more about this topic, check out my course on Udemy!

Unpacking Python’s Built in JSON Library

Introduction

Getting the Package

What’s Inside?

The Conversion Matrix

Converting from JSON to Python

json.loads()

Parameters

Standard Usage

Cls

Object Hook

Parse Float

Parse Int

Parse Constant

Object Pairs Hook

json.load()

Parameters

Standard Usage

Converting from Python to JSON

json.dumps()

Parameters

Standard Usage

Skip Keys

Ensure ASCII

Check Circular

Allow NaN

cls

Indent

Separator

Default

Sort Keys

json.dump()

Parameters

Standard Usage

Creating a Custom JSONDecoder

Creating a Custom JSONEncoder

Conclusion

Master JSON Using Python