Python logging module quirks

Python logging module quirks

ยท

9 min read

Introduction

Python comes with the default, ready-to-use logging module. It's kind of nice cause we don't have to install additional libraries to start logging in to our application. However, for someone coming from other language backgrounds, such as Javascript, TypeScript, Java, C#, and Go. The Python logger module might resemble familiar tools, but it has its specifics. In this article, I would like to cover some quirks that I found along the way while I was working recently on a python project.

Read the documentation

Yeah, that would be a sensible thing to do. Make a cup of coffee, relax and read the logging module documentation [1], all the 13 sub-pages, of course ๐Ÿ™‚. Unfortunately, not everyone has the time to do so. I usually skim through the documentation, but when I deal with multiple programming languages daily, it's hard to hold all the language and framework-specifics in my head. Anyway, enough about me and my saga with Python documentation; let's look at the logger module.

Log levels

Why do we need logging levels? Good question. Because depending on how we run our application, we might want to have different types of logs. For example, if we're in the developing phase, we would like to see verbose logs of what exactly happens in the application. But when we're ready for production, we would probably want to limit our logs to only errors - otherwise, we will have a lot of logs, where most of them are probably going to be just noise. By noise, I mean they will not have useful information while simultaneously being very verbose and distracting. So we need to understand what kind of information we need in our logs. Having everything everywhere usually will not get us far.

The logger module comes with several levels[2]:

CRITICAL = 50
ERROR = 40
WARNING = 30
INFO = 20
DEBUG = 10
NOTSET = 0

I think the names are quite self-explanatory, but you can always go back to the original docs. Each level has a number assigned to it. The higher the number, the higher its priority. For example, if we set the log level to be WARNING, we will only get logs with a priority higher than WARNING, i.e. higher than 30. Similarly, if we set it to NOTSET level we will get all the logs.

Let's run an example logging python application:

import logging

logging.debug('test')
logging.info('test')
logging.warning('test')
logging.error('test')
logging.critical('test')

Expected output:

WARNING:root:test
ERROR:root:test
CRITICAL:root:test

Where did my logs go? We have only WARNING, ERROR, and CRITICAL ๐Ÿค”. As we mentioned above, it's all about the level of priority set.

Let's set the log level as DEBUG:

import logging

logging.basicConfig(level=logging.DEBUG)

logging.debug('test')
logging.info('test')
logging.warning('test')
logging.error('test')
logging.critical('test')

Run it, and observe the output:

$ python main.py 
DEBUG:root:test
INFO:root:test
WARNING:root:test
ERROR:root:test
CRITICAL:root:test

As expected, we get all the logging levels ๐Ÿฅณ.

Logging to files

Similarly to what we did above, we can log into a local file:

import logging

logging.basicConfig(filename='test.log', filemode='w')
logging.warning('test')

expected test.log file content:

WARNING:root:test

There're more configuration options available [3]

Formatting the output

When we have multiple components in the same application logging message, it's natural to want them to log in the same format; this way, we can apply the exact configuration for message content, such as process name, level, and module names across our application.

import logging

logging.basicConfig(format='%(process)d-%(levelname)s-%(message)s')
logging.error('test')

Same here. More formatting options in the docs [4]

Capturing stack traces

What happens when we have an error? What information are we expected to see? Usually, we would want a short description of what happened and a stack trace. Stack trace provides information about the execution path, how we got to that error, function names, calls and execution context.

But first, Let's see if we can guess would the following log be. We're going to raise a ZeroDivisionError exception intentionally:

import logging

try:
  1 / 0
except Exception as e:
  logging.error(e)

Looks ok, right? Well, that would just log:

$ python ./main.py 
ERROR:root:division by zero

I don't know about you, but the first time I saw this log in production, I was pretty puzzled about it. If you made it that far I salute you ๐Ÿซก. That's one of the quirks I was referring to in my intro.

We need to add the exec_info=True parameter or use logging.exception function to get the stack trace information.

import logging

try:
  1 / 0
except Exception as e:
  logging.error('OOPS', exc_info=True)
  # OR: logging.exception('OOPS')

Expected output:

ERROR:root:OOPS
Traceback (most recent call last):
  File "...py", line 4, in <module>
    1 / 0
ZeroDivisionError: division by zero

Yep. We got our stack trace. Let's move on ๐Ÿš€.

Module level logs, namespaces, or however you want call them

So far, we have used the default logger, aka the root logger. That's all fine, but most Python applications consist of multiple modules logging altogether. In that case, we would probably want to differentiate which logs come from which modules. I mean, why not?

Let's see an example of such a module level logs [5]:

import logging

module_logger = logging.getLogger('module_logger')
module_logger.warning('module_logger log message')

logging.warning('module_logger log message')

Expected output:

module_logger test
root test

We created a custom logger instance named module_logger. Different from the root logger, its name is not part of the output. But have no fear! We can add to our output via formatting [4]:

import logging

logging.basicConfig(format='%(name)s %(message)s',)
module_logger = logging.getLogger('module_logger')
module_logger.warning('test')
logging.warning('test')

Observe the output:

module_logger log message
WARNING:root:module_logger log message

Kind of cool. I would say ๐Ÿ˜Ž.

Handlers

Let's have a look now at log handlers[6]. Handlers are helpful when we want to(surprisingly) handle logs in multiple ways, for example, send the output to various destinations. Each logger might have multiple handlers.

Example:

import logging

logger = logging.getLogger(__name__)

logger.addHandler(logging.StreamHandler())
logger.addHandler(logging.FileHandler('out.log'))

logger.warning('test')
logger.error('test')

Let's run it. We should see the message both in our terminal and in the local file named: out.log. Notice that the file will not be overridden when we re-run the program; content will be appended. One common use of logging handlers is sending logs over the network. For example, log aggregation services.

Structured Logs

We covered briefly in previous sections the log formatting. That helps with log structure. But once we start configuring our logger, we might find ourselves writing and maintaining a boilerplate code. I found jsonlogger[7] very useful for JSON-structured logs. It is easy to configure, it just works. Once installed, here's an example of how to use it:

import logging
from pythonjsonlogger import jsonlogger

root_logger = logging.getLogger()
logHandler = logging.StreamHandler()
logHandler.setFormatter(jsonlogger.JsonFormatter())

root_logger.addHandler(logHandler)

root_logger.setLevel(logging.DEBUG)

logger = logging.getLogger('test')
logger.info('test')
logger.warning('test')
logger.error('test')

Expected output:

{"message": "test"}
{"message": "test"}
{"message": "test"}

JSON-structured logs are useful when processing log files for analytics or BI (business intelligence). As well as for searching. Searching and correlating log messages is very valuable when developing and troubleshooting.

Logs and Unix streams

In Unix-based systems, we have stdin, stdout, and stderr streams. Unix streams are like everything else in Unix, treated as files ๐Ÿง. What I mean is that we can read and write to streams as if we were handling files. Well, kind of.

Ok, cool. how is it relevant to this post? Let's run one of the logging applications from above, and check which stream is it using:

import logging

logger = logging.getLogger(__name__)

logger.addHandler(logging.StreamHandler())
logger.addHandler(logging.FileHandler('out.log'))

logger.warning('test')
logger.error('test')

We can run and redirect each stream to a separate file.

$ python logs.py 1> stdout.log 2> stderr.log

Here, 1 stands for stdout, 2 for stderr. We redirect stdout stream to stdout.log file and stderr to stderr.log. More info about redirecting output [8].

If we check your stderr.log file, it has all the content. How come? Was it an actual error? No? Well, that's the default behavior of the Python logging module. I found it quite surprising ๐Ÿ˜ถ. It's problematic cause if our application sends logs to some kind of log aggregation service such as (logstash)[elastic.co/logstash/] or (datadog)[datadoghq.com/] our logs will be identified as errors when in reality they are not.

We can add stream parameter to point to stdout:

logger.addHandler(logging.StreamHandler(stream=sys.stdout))

We should be able to locate the log content in stdout.log now.

Streaming to both stdout and stderr segregation

When our application is monitored by other services, it can be useful to have the segregation of error and other logs.

One way to achieve to define multiple handlers:

import sys
import logging
from pythonjsonlogger import jsonlogger

class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super(CustomJsonFormatter, self).add_fields(log_record, record, message_dict)
        log_record["level"] = record.levelname
        log_record["name"] = record.name
        log_record["message"] = record.getMessage()
        if record.levelname == "ERROR" or record.levelname == "EXCEPTION":
            log_record["exception"] = record.exc_info
            log_record["stacktrace"] = record

formatter = CustomJsonFormatter("%(level)s %(name)s %(message)s")

stdoutHandler = logging.StreamHandler(stream=sys.stdout)
stdoutHandler.setLevel(logging.DEBUG)
stdoutHandler.setFormatter(formatter)

stderrHandler = logging.StreamHandler(stream=sys.stderr)
stderrHandler.setLevel(logging.ERROR)
stderrHandler.setFormatter(formatter)

root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)

root_logger.addHandler(stderrHandler)
root_logger.addHandler(stdoutHandler)

However, that results in duplication of logs; the ERROR logs will result in both stderr and stdout streams. That's far from ideal.

But! We can use builtin Filter Objects. Example:

class StdoutFilter(logging.Filter):
    def filter(self, record):
        return record.levelno >= ROOT_LEVEL and record.levelno < logging.ERROR

Here we defined a class that inherits from logging.Filter. It will decide for each log record whether it should be handled by the handler or not.

import sys
import logging
from pythonjsonlogger import jsonlogger


class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super(CustomJsonFormatter, self).add_fields(
            log_record, record, message_dict)
        log_record["level"] = record.levelname
        log_record["name"] = record.name
        log_record["message"] = record.getMessage()
        if record.levelname == "ERROR" or record.levelname == "EXCEPTION":
            log_record["exception"] = record.exc_info
            log_record["stacktrace"] = record


ROOT_LEVEL = logging.DEBUG


class StdoutFilter(logging.Filter):
    def filter(self, record):
        return record.levelno >= ROOT_LEVEL and record.levelno < logging.ERROR


class StderrFilter(logging.Filter):
    def filter(self, record):
        return record.levelno >= logging.ERROR


formatter = CustomJsonFormatter("%(level)s %(name)s %(message)s")

stdoutHandler = logging.StreamHandler(stream=sys.stdout)
stdoutHandler.setFormatter(formatter)
stdoutHandler.addFilter(StdoutFilter())

stderrHandler = logging.StreamHandler(stream=sys.stderr)
stderrHandler.setFormatter(formatter)
stderrHandler.addFilter(StderrFilter())


root_logger = logging.getLogger()
root_logger.setLevel(ROOT_LEVEL)

root_logger.addHandler(stderrHandler)
root_logger.addHandler(stdoutHandler)


logging.debug('test')
logging.info('test')
logging.warning('test')
logging.error('test')
logging.critical('test')

Run it:

$ python logs.py 1> stdout.log 2> stderr.log

And observe file content:

stdout.log:

{"level": "DEBUG", "name": "root", "message": "test"}
{"level": "INFO", "name": "root", "message": "test"}
{"level": "WARNING", "name": "root", "message": "test"}

stderr.log:

{"level": "ERROR", "name": "root", "message": "test", "exception": null, "stacktrace": "<LogRecord: root, 40, ...main.py, 53, \"test\">"}
{"level": "CRITICAL", "name": "root", "message": "test"}

Another promising library I've encountered is loguru9]. However, I couldn't figure out how to configure stream segregation there.

Summary

Python logging module should be flexible enough to fit the general purpose. First, however, we need to understand how it works, especially if we come from another programming language background. Python logging module default behavior, and the difference between root and module loggers, were alien to me. But I survived ๐Ÿ’ช. I hope it makes sense and gives a high-level overview of how to use the logging module and its configurations.

Another thing to mention is that all these logging examples could've been implemented as configuration files[10].

References

[1] docs.python.org/3/library/logging.html

[2] docs.python.org/3/library/logging.html#logg..

[3] docs.python.org/3/library/logging.html#logg..

[4] docs.python.org/3/library/logging.html#form..

[5] docs.python.org/3/library/logging.html#modu..

[6] docs.python.org/3/library/logging.handlers...

[7] pypi.org/project/python-json-logger

[8] gnu.org/software/bash/manual/bash.html#Redi..

[9] loguru.readthedocs.io/en/stable/overview.html

[10] docs.python.org/3.10/howto/logging-cookbook..

ย