Introduction
JavaScript Object Notation (JSON) is a data format that’s used to transport data across the web. Python has built in data structures and types that are very similar to JSON’s, but in order for us to interact with JSON data in python we need to deserialize or convert it to acceptable python equivalents. We can do this by using the built in json library.
Getting the Package
Official Documentation: https://docs.python.org/3/library/json.html
No installation necessary. Package comes preloaded with Python distribution.
What’s Inside?
- Classes
- JSONEncoder()
- JSONDecoder()
- Functions
- dump()
- dumps()
- load()
- loads()
The Conversion Matrix
There are two main data structures in JSON: Objects and Arrays. When converting JSON to Python, these structures get converted to dictionaries and lists, respectively.
In addition to data structure changes, JSON data types are also converted to python data types. For example, JSON number data types are converted to python’s integer data type.
The json library enables us to easily convert back and forth between JSON and Python. To understand how this conversion is done, I’ve included a conversion chart below.
JSON | Direction | Python |
---|---|---|
object {"key":"value"} | <–> | dict {'key':'value'} |
array [100,200,300] | <–> | list [100,200,300] |
array ["Hello","There"] | <– | tuple ('Hello','There') |
string "Hello There!" | <–> | str 'Hello There!' |
number 15 | <–> | int 15 |
number (with decimals) 6.99 | <–> | float 6.99 |
true | <–> | True |
false | <–> | False |
null | <–> | None |
Infinity | <–> | inf |
-Infinity | <–> | -inf |
Nan | <–> | nan |
Converting from JSON to Python
Within the json library, there are two methods for converting JSON to Python. One enables you to deserialize JSON from a file and the other to deserialize JSON strings. These are named load() and loads() respectively. An easy way to remember the difference between these two is to note the s on the second function indicates a string. Check out how to use each of these below!
json.loads()
Allows you to load JSON strings.
json.loads(s
,*
,cls=None
,object_hook=None
,parse_float=None
,parse_int=None
,parse_constant=None
,object_pairs_hook=None
,**kw
)
Parameters
- s: string containing valid json.
- cls (optional): allows you to use a custom JSONDecoder class.
- object_hook (optional): allows you to place JSON data into a custom object.
- parse_float (optional): allows you to customize how floats are handled when decoding.
- parse_int (optional): allows you to customize how integer values are handled when decoding.
- parse_constant (optional): allows you to customize how constant strings values (null, true, false) are handled when decoding.
- object_pairs_hook (optional): allows you to customize how key-value pairs are returned. Overrides the default (dict).
Standard Usage
For most python users, the standard usage is adequate.
import json
json_string ='{"name": "John", "age": 30, "city": "New York"}'
json.loads(json_string)
Output
{'name': 'John', 'age': 30, 'city': 'New York'}
Cls
The cls
parameter allows us to pass a custom JSONDecoder class object. These user defined JSONDecoders allow you create your own JSON parsers. This is definitely for more advanced users. We’ll address this parameter later after we show you how to make a JSONDecoder.
Object Hook
Object Hook allows you to take JSON data and directly insert it into custom Python Objects.
Let’s say that we have a ContactInfo class that we want to pass our JSON data into. We’ve defined this class already in the code below. We need to also define a object hook function. We’ll call our function to_contact
and then write the logic that will puts everything where it needs to go.
import json
class ContactInfo:
def __init__(self, name, phone_number, email):
self.name = name
self.phone_number = phone_number
self.email = email
def __str__(self):
return f"Name: {self.name}\nPhone Number: {self.phone_number}\nEmail: {self.email}"
def to_contact(obj_dict):
if 'name' in obj_dict and 'phone_number' in obj_dict and 'email' in obj_dict:
return ContactInfo(obj_dict['name'], obj_dict['phone_number'], obj_dict['email'])
return obj_dict
json_string = '{"name": "John Doe", "phone_number": "123-456-7890", "email": "[email protected]"}'
contact = json.loads(json_string, object_hook=to_contact)
print(contact)
Parse Float
Let’s walk through an example of how to use the parse_float
parameter. Let’s say that we want our deserialized JSON data to round decimal values to the nearest hundredth.
We’ll use a lambda function to take all identified decimal values and round them. This function will take our long form version of Pi in our JSON data and round it to the nearest hundredth decimal.
import json
json_string = '{"pi": 3.1415, "large_number": 123456789}'
data = json.loads(json_string, parse_float=lambda x: round(float(x), 2))
print(data)
Parse Int
Parse Int allows us to manipulate integer values as we are decoding them into python.
Let’s say that we have some number values that are represented in thousands. When we read these numbers into Python, we want to show them in their “natural” form. To do this we can use the parse_int
parameter.
import json
def to_thousands(string):
return int(string) * 1000
json_string = '{"number": 10, "representation":"Thousands"}'
parsed_data = json.loads(json_string, parse_int=to_thousands)
print(parsed_data)
Parse Constant
Parse Constant allows you to deal with special JSON Values including:
- Infinity
- -Infinity
- Nan
By default these values are automatically converted to their python equivalents (see conversion matrix above). That said, we can change how these are read into python using the parse_constant
parameter.
import json
def custom_parse_constant(constant):
if constant == '-Infinity':
return 'missing'
elif constant == 'Infinity':
return 'missing'
elif constant == 'NaN':
return 'missing'
else:
return constant
json_string = '{"value1": Infinity, "value2": -Infinity, "value3": NaN}'
data = json.loads(json_string, parse_constant=custom_parse_constant)
print(data)
Object Pairs Hook
This parameter allows you to override the default return value (dict) with whatever has been specified in the “hook function”.
For example, we can use object_pairs_hook
to create a dictionary where the keys are all uppercase.
import json
def uppercase_keys(pairs):
return {key.upper(): value for key, value in pairs}
json_string = '{"name": "john", "age": 30, "city": "new york"}'
data = json.loads(json_string, object_pairs_hook=uppercase_keys)
print(data)
json.load()
Allows you to load JSON from a file. It contains all of the same parameters as the loads function with the exception of the fp
parameter instead of a string. As a result we will only be reviewing “standard usage”.
json.load(fp
,*
,cls=None
,object_hook=None
,parse_float=None
,parse_int=None
,parse_constant=None
,object_pairs_hook=None
,**kw)
Parameters
fp
: file pathcls
(optional): allows you to use a custom JSONDecoder class.object_hook
(optional): allows you to place JSON data into a custom object.parse_float
(optional): allows you to customize how floats are handled when decoding.parse_int
(optional): allows you to customize how integer values are handled when decoding.parse_constant
(optional): allows you to customize how constant strings values (null, true, false) are handled when decoding.object_pairs_hook
(optional): allows you to customize how key-value pairs are returned. Overrides the default (dict).
Standard Usage
The name of the fp
parameter is a bit misleading. It doesn’t just want a file path. It actually wants you to open a file for it to read the data from. Let’s demonstrate how to do this assuming you have the following data.json file:
{
"name": "John Doe",
"age": 30,
"city": "New York",
"is_student": false,
"grades": [90, 85, 88]
}
To open this file, you would then run the following code:
import json
with open('data.json', 'r') as json_file:
data = json.load(json_file)
print(data)
Converting from Python to JSON
json.dumps()
Enables you to convert Python data structures to JSON formatted string.
json.dumps(obj
,*
,skipkeys=False
,ensure_ascii=True
,check_circular=True
,allow_nan=True
,cls=None
,indent=None
,separators=None
,default=None
,sort_keys=False
,**kw)
Parameters
obj
: A serializable python object such as a dictionary, list, or tuple.skipkeys
(optional): allows you to skip serializing keys that are not basic types (str, int, float, bool, None).ensure_ascii
(optional): allows you to escape non-ASCII characters in the output withuXXXX
sequences.check_circular
(optional): allows you to not check for circular references while serializing the object.allow_nan
(optional): enables you to allow NaN, Infinity, and -Infinity as JSON number values. Default is True.cls
(optional): allows you to customize the serialization process with a JSONEncoder.indent
(optional): allows you to specify the indentation level to use when pretty-printing the JSON output.separators
(optional): allows you to specify custom separators to use for the JSON output. Passed in a Tuple format. It should be a tuple of two strings: the first one representing the separator between keys and values, and the second one representing the separator between items in a dictionary. Default is(',', ':')
.default
(optional): allows you to pass a function that will be used for objects that are not serializable by default. Default is None.sort_keys
(optional): allows you to specify whether to sort the keys of dictionaries alphabetically before serializing. Default is False.
Standard Usage
import json
data = {
"name": "John Doe",
"age": 30,
"city": "New York",
"is_student": False,
"grades": [90, 85, 88]
}
json_string = json.dumps(data)
print(json_string)
Skip Keys
There may be times when your dictionary or list has a strange key. By strange, I mean that it does not conform to JSON’s standard and therefore would create an invalid JSON string when converted. To avoid this python will throw an error if it finds keys in your dictionary that are not str, int, float, bool or None.
To avoid this error and still maintain a valid JSON string, you can skip these problematic keys. For our example, let’s say that we have a tuple as one of our keys. Since it isn’t one of the accept key data types, it will throw an error unless we call the skipkeys
parameter and pass True.
import json
data = {
(1,2,3): "value1",
"key2": "value2"
}
# json_str_without_skipkeys = json.dumps(data)
# print(json_str_without_skipkeys)
json_str_with_skipkeys = json.dumps(data, skipkeys=True)
print(json_str_with_skipkeys)
Ensure ASCII
By Default, JSON is encoded in UTF-8. This means that it supports a wide range of international characters. There are some rare cases where you may need to encode the JSON string in an ASCII acceptable format. To do this you can use the ensure_ascii
parameter.
import json
data = {
"name": "José",
"city": "São Paulo",
"country": "日本"
}
json_str_ensure_ascii_true = json.dumps(data, ensure_ascii=True)
print(json_str_ensure_ascii_true)
json_str_ensure_ascii_false = json.dumps(data, ensure_ascii=False)
print(json_str_ensure_ascii_false)
Check Circular
There is never a circumstance where you would want to change the check_circular
parameter. By nature, circular references would run forever. Even if you did set this parameter to False, you would still get a RecursionError. The important thing to know with this parameter is that it makes it so your code doesn’t run endlessly.
Allow NaN
We’ve briefly discussed special JSON values NaN, Infinity, and -Infinity. The allow_nan
parameter gives you control on whether these special values get encoded or not. One of the most common use cases is for data validation.
Let’s say that we have a system that we are sending some JSON data to, but it can’t accept special JSON values. On exporting the data we want to catch these issues so that the system doesn’t run into issues loading our data. We can do that by setting allow_nan
equal to False.
import json
data = {
'Name': float('nan'),
'Price': float('inf'),
'CostOfGoods': float('-inf')
}
json_str_allow_nan_true = json.dumps(data, allow_nan=True)
print(json_str_allow_nan_true)
json_str_allow_nan_true = json.dumps(data, allow_nan=False)
print(json_str_allow_nan_true)
cls
The cls
parameter allows us to pass a custom JSONEncoder class object. These user defined JSONEncoders allow you create your own JSON serialization process. This is definitely for more advanced users. We’ll address this parameter later after we show you how to make a JSONEncoder.
Indent
This parameter is particularly useful when you want to make json more readable. You can “pretty-print” the output by calling the indent function. Keep in mind that this adds more whitespace and increases the file size. For cases where you want to transport data to another application, its best to keep the default setting.
import json
data = {
"name": "John",
"age": 30,
"city": "New York",
"interests": ["music", "sports", "reading"],
"is_student": True,
"grades": {"math": 90, "science": 85, "history": 88}
}
json_str_default_indent = json.dumps(data)
print(json_str_default_indent)
json_str_indent_4 = json.dumps(data, indent=4)
print(json_str_indent_4)
Separator
When encoding data into JSON format, we have some flexibility with the separators used in both Objects and Arrays. By default Arrays use commas to separate items and Objects use colons to separate key value pairs. We can overwrite this by calling the separators
parameter and passing a tuple with the new separators in a tuple format.
You might be wondering why someone might want to do this. The only reason it really makes sense is if another system uses a custom parser. For most purposes, the standard separators is just fine.
import json
data = {
"name": "John",
"age": 30,
"city": "New York",
"interests": ["music", "sports", "reading"],
"is_student": True,
"grades": {"math": 90, "science": 85, "history": 88}
}
json_str_default_separators = json.dumps(data)
print(json_str_default_separators)
json_str_custom_separators = json.dumps(data, separators=(',', ' => '))
print(json_str_custom_separators)
Default
There are times where you may want to serialize objects that simply cannot be by default. Remember our custom ContactInfo class from earlier? This is the perfect example for utilizing the default parameter. We simply need to define a new function that will convert the object to a dictionary and then we can pass it in.
import json
class ContactInfo:
def __init__(self, name, phone_number, email):
self.name = name
self.phone_number = phone_number
self.email = email
def __str__(self):
return f"Name: {self.name}\nPhone Number: {self.phone_number}\nEmail: {self.email}"
def contact_to_dict(contact):
return {
'name': contact.name,
'phone_number': contact.phone_number,
'email': contact.email
}
contact = ContactInfo("John Doe", "123-456-7890", "[email protected]")
json_string = json.dumps(contact, default=contact_to_dict)
print(json_string)
This is perfect for on the fly serialization. That said, a better way to accomplish this same task is to simply create a class method that allows you to export the object as JSON.
import json
class ContactInfo:
def __init__(self, name, phone_number, email):
self.name = name
self.phone_number = phone_number
self.email = email
def __str__(self):
return f"Name: {self.name}\nPhone Number: {self.phone_number}\nEmail: {self.email}"
def to_json(self):
return json.dumps({
'name': self.name,
'phone_number': self.phone_number,
'email': self.email
})
contact = ContactInfo("John Doe", "123-456-7890", "[email protected]")
json_string = contact.to_json()
print(json_string)
Sort Keys
As the name implies, sort keys allows you to sort the keys when serializing your JSON data. To sort the keys and override the default, simply pass True into the sort_keys
parameter.
import json
data = {
"name": "John",
"age": 30,
"city": "New York",
"interests": ["music", "sports", "reading"],
"is_student": True,
"grades": {"math": 90, "science": 85, "history": 88}
}
json_str_unsorted_keys = json.dumps(data, sort_keys=False)
print( json_str_unsorted_keys)
json_str_sorted_keys = json.dumps(data, sort_keys=True)
print(json_str_sorted_keys)
json.dump()
Enables you to convert Python data structures to JSON formatted file. It contains all of the same parameters as the dumps function with the exception of the additional fp
parameter. As a result we will only be reviewing “standard usage”.
json.dump(obj
,fp
,*
,skipkeys=False
,ensure_ascii=True
,check_circular=True
,allow_nan=True
,cls=None
,indent=None
,separators=None
,default=None
,sort_keys=False
,**kw)
Parameters
obj
: A serializable python object such as a dictionary, list, or tuple.fp
: File Pathskipkeys
(optional): allows you to skip serializing keys that are not basic types (str, int, float, bool, None).ensure_ascii
(optional): allows you to escape non-ASCII characters in the output withuXXXX
sequences.check_circular
(optional): allows you to not check for circular references while serializing the object.allow_nan
(optional): enables you to allow NaN, Infinity, and -Infinity as JSON number values. Default is True.cls
(optional): allows you to customize the serialization process with a JSONEncoder.indent
(optional): allows you to specify the indentation level to use when pretty-printing the JSON output.separators
(optional): allows you to specify custom separators to use for the JSON output. Passed in a Tuple format. It should be a tuple of two strings: the first one representing the separator between keys and values, and the second one representing the separator between items in a dictionary. Default is(',', ':')
.default
(optional): allows you to pass a function that will be used for objects that are not serializable by default. Default is None.sort_keys
(optional): allows you to specify whether to sort the keys of dictionaries alphabetically before serializing. Default is False.
Standard Usage
The name of the fp
parameter is a bit misleading. It doesn’t just want a file path. It actually wants you to open a file for it to read the data into.
import json
data = {
"name": "John Doe",
"age": 30,
"city": "New York",
"is_student": False,
"grades": [90, 85, 88]
}
with open('data.json', 'w') as json_file:
json.dump(data, json_file)
Creating a Custom JSONDecoder
The json library gives you the flexibility to create custom JSON decoders. Let’s show you how to create one now!
We’ll create a new class called CustomDecoder. This class will inherit the JSONDecoder class. JSONDecoder has a method called decode so we will be using that, but we will also be redefining what that function does within our CustomDecoder.
We’re going to tell our new decoder to make all string values in our Objects uppercase. And we’ll do the same for all items in our Arrays.
import json
class CustomDecoder(json.JSONDecoder):
def decode(self, s):
decoded_obj = super().decode(s)
if isinstance(decoded_obj, dict):
return {k: v.upper() if isinstance(v, str) else v for k, v in decoded_obj.items()}
elif isinstance(decoded_obj, list):
return [item.upper() if isinstance(item, str) else item for item in decoded_obj]
else:
return decoded_obj
Now that we’ve successfully created the Customer decoder, we can utilize it on some sample data.
json_data = '{"name": "john", "age": 30, "city": "new york"}'
decoder = CustomDecoder()
decoded_data = decoder.decode(json_data)
print(decoded_data)
OUTPUT:
{'name': 'JOHN', 'age': 30, 'city': 'NEW YORK'}
When you run the code, you should now see a Python dictionary with all the string values in an uppercase format.
Creating a Custom JSONEncoder
Now that we’ve create a custom decoder, lets walk through what you can do with JSONEncoder. Sets are a data structure in python that are not normally JSON serializable. We could change this, by writing a custom encoder.
We simply need to create a new class and tell python to convert any set it finds to a list.
import json
class CustomJSONEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, set):
return list(obj)
elif hasattr(obj, '__dict__'):
return obj.__dict__
else:
return json.JSONEncoder.default(self, obj)
Let’s test out our new encoder on some data containing a set.
data = {
'name': 'John',
'age': 30,
'interests': {'coding', 'reading', 'traveling'},
'address': {
'street': '123 Main St',
'city': 'Anytown',
'country': 'USA'
}
}
encoded_data = json.dumps(data, cls=CustomJSONEncoder,indent=4)
print(encoded_data)
OUTPUT:
{
"name": "John",
"age": 30,
"interests": [
"reading",
"coding",
"traveling"
],
"address": {
"street": "123 Main St",
"city": "Anytown",
"country": "USA"
}
}
When you run the code, you should get the output above. Notice how the set is now a list.
Conclusion
The built in JSON library is extremely useful for working with JSON data. With its robust functionality and ability to customize how you encode and decode JSON, you can accomplish a lot. If you’re interested in learning more about this topic, check out my course on Udemy!