slogtest

2023-08-16

The Go team recently released Go 1.21, and one of its major features was the addition of slog, a structured logging module, to the standard library. There are a lot of good discussions about using slog, but I’m going to discuss testing a custom slog handler. To start with, let’s talk about what I mean by handlers and custom handlers.

A slog handler controls what slog does with calls to logging methods. More concretely, a handler determines the format of log output. slog itself provides two handlers: one outputs JSON and the other outputs key=value pairs. You choose a handler when you create a slog logger.

jLogger := slog.New(slog.NewJSONHandler(os.Stderr, nil))
tLogger := slog.New(slog.NewTextHandler(os.Stdout, nil))
// Later…
jLogger.Info("hello", "count", 3)
// {"time":"2022-11-08T15:28:26.000000000-05:00","level":"INFO","msg":"hello","count":3}
tLogger.Info("hello", "count", 3)
// time=2022-11-08T15:28:26.000-05:00 level=INFO msg=hello count=3

In addition to the two default handlers, slog makes it relatively easy to create a handler. Jonathan Amsterdam, slog’s author, has even written a guide to writing slog handlers. If you’re interested in writing a handler, I highly recommend that you read the whole guide. But I’m only going to discuss the pros and cons of testing a handler with testing/slogtest.

`slogtest`: Pros

First pro: slogtest finds bugs. In particular, slogtest will help make sure your handler follows all the rules that handlers must follow. Since Jonathan Amsterdam wrote slogtest as well as slog, he’s likely to understand the rules better than most people.

Famously, anecdotes aren’t data, but here are some anecdotes anyway. I wrote a slog handler, and I initially tested it without slogtest (because slogtest didn’t exist yet). Despite the previous tests, slogtest revealed two bugs in my handler. (See this changelog for details.) In order to experiment with slogtest further, I used it on another handler (which also had prior tests), and I discovered a bug there too. (See this pull request for details.) slogtest definitely found bugs in both of these handlers.

Second pro: slogtest is easy to use (for the most part). The module exports only a single function TestHandler. TestHandler takes two arguments, a handler to test and a callback function. The TestHandler function invokes the given handler repeatedly with different inputs and uses the callback function to parse logger output and store the parsed results in a slice of map[string]any. (I’ll say more about parsing and map[string]any below.) Finally, TestHandler checks the results for correctness and returns either an error or nil. (Strictly speaking, slogtest may find multiple errors, but it uses errors.Join to return them as a single item.)

If you were testing a (super fun) handler that produced JSON, your code might look like this. (This is a very lightly adapted version of the parsing example in slogtest’s documentation.)

func TestSlogtest(t *testing.T) {
	var buf bytes.Buffer
	h := module.SuperFunJSONHandler(&buf, nil)
	results := func() []map[string]any {
		ms := []map[string]any{}
		for _, line := range bytes.Split(buf.Bytes(), []byte{'\n'}) {
			if len(line) == 0 {
				continue
			}
			var m map[string]any
			if err := json.Unmarshal(line, &m); err != nil {
				t.Fatal(err)
			}
			ms = append(ms, m)
		}
		return ms
	}
	err := slogtest.TestHandler(h, results)
	if err != nil {
		t.Error(err)
	}
}

Since each call to the logger produces one line, the callback function (results) splits the output (which is stored in a bytes buffer) on newlines and then parses each log using the standard library’s json module. If any of the lines fails to parse, we stop the test run with t.Fatal. In theory, we could simply signal this with an error, but if a JSON handler doesn’t even produce JSON, then we have a bigger problem than whether or not it follows slog’s rules for handlers.

`slogtest`: Cons?

First con: I said that slogtest is easy to use “for the most part,” but what about when it’s not easy to use? The whole point of slog is to provide structured logging. As such, slogtest expects a handler’s output to be parsable. If your handler produces a well-known format, such as JSON or YAML, you can probably find an existing module to parse that format. If your handler produces non-standard output—especially if the output is only semi-structured—then you may have more trouble.

As an example, I wrote a slog handler that produces output like the following. (The format is based on an old blog post by Brandur Leach.)

humaneLogger.Info("hello", "count", 3)
//  INFO | hello | count=3 time="2022-11-08T3:50.26 EDT"

There are three sections separated by " | ". The first section contains the level, the second contains the message, and the third contains all other attributes (including the time attribute, if present) as key=value pairs. Only the third section is fully structured. The level and the message do not include keys

Why is this a problem? Here’s what slogtest’s documentation says:

The results function…should return a slice of map[string]any, one for each call to a Logger output method. The keys and values of the map should correspond to the keys and values of the Handler’s output. Each group in the output should be represented as its own nested map[string]any. The standard keys slog.TimeKey, slog.LevelKey and slog.MessageKey should be used.

If your handler produces output without the appropriate keys, as mine does, or if the handler drops some attributes altogether, then you’ll get failures that may not indicate a bug. In order to avoid this, you probably need to write a fairly kludgy custom parser. As an example, here’s what I used for my handler. (Full disclosure: I based this on slog’s own test code for its text handler. Credit Jonathan Amsterdam for anything elegant here, and blame me for the kludges.)

func parseHumane(bs []byte) (map[string]any, error) {
	top := map[string]any{}
	s := string(bytes.TrimSpace(bs))
	// Humane divides each line into three parts (level, message, kv pairs)
	// and delimits these three parts by " | ".  We can use strings.Split
	// to get at the pieces.  Then we need to create proper key-value pairs
	// for level and message.
	pieces := strings.Split(s, " | ")
	top[slog.LevelKey] = strings.TrimSpace(pieces[0])
	top[slog.MessageKey] = strings.TrimSpace(pieces[1])
	// The rest of the line contains kv pairs that we can (roughly) divide
	// by spaces. This is crude since it will split a quoted key or value
	// that contains a space. For this test, however, this will work---as
	// long as I make sure to set a time format without whitespace.
	s = pieces[2]
	for len(s) > 0 {
		kv, rest, _ := strings.Cut(s, " ")
		k, value, found := strings.Cut(kv, "=")
		if !found {
			return nil, fmt.Errorf("no '=' in %q", kv)
		}
		keys := strings.Split(k, ".")
		// Populate a tree of maps for a dotted path such as "a.b.c=x".
		m := top
		for _, key := range keys[:len(keys)-1] {
			x, ok := m[key]
			var m2 map[string]any
			if !ok {
				m2 = map[string]any{}
				m[key] = m2
			} else {
				m2, ok = x.(map[string]any)
				if !ok {
					return nil, fmt.Errorf("value for %q in composite key %q is not map[string]any", key, k)
				}
			}
			m = m2
		}
		m[keys[len(keys)-1]] = value
		s = rest
	}
	return top, nil
}

This (so-called) parser assumes a lot, and it would fail in all sorts of cases. However, slogtest is very forgiving. The guide to writing handlers explains:

If you have to write your own parser, it can be far from perfect. The slogtest package uses only a handful of simple attributes. (It is testing handler conformance, not parsing.) Your parser can ignore edge cases like whitespace and newlines in keys and values. Before switching to a YAML parser, we wrote an adequate custom parser in 65 lines.

To give one more example, tint’s output is even more difficult to parse. First, the output is colored using ANSI escape codes. Happily, you can turn that feature off when configuring a tint handler. Beyond that, the output looks like this.

tintLogger.Info("hello", "count", 3)
// 3:50PM INF hello count=3

The time, level, and message come first, and none of them has a standard key. Since the time may be omitted, you can’t simply split the line on spaces and assume that the first item is a time, the second is the level, the third is the message, and the rest are key=value pairs. You have to check whether the first element parses as a time in a given format, and then process the rest of the line accordingly. In any case, I wrote an even kludgier parser for tint. Kludgy as the code is, it was good enough to find a bug.

Second con: this may be a “me problem,” but initially I didn’t fully absorb part of the documentation for slogtest.

Each group in the output should be represented as its own nested map[string]any.

This sentence means that the callback function must return a map[string]any with a proper tree structure. slog can collect key-value pairs into groups and subgroups, and slogtest expects each group and subgroup to be stored in the right place as a dependent map[string]any in the outermost map. Since the handler I wrote is effectively flat rather than hierarchical, I wasn’t thinking of my output as a tree. As a result, I wasn’t sure how to satisfy slogtest at first. That said, once you know what slogtest expects, you can probably write a working parser without too much trouble. (You are also probably much more comfortable with tree structures than I am.)

The Bottom Line

Try slogtest if you write a slog handler. You may need to write a parser, but as the guide to writing handlers says, you don’t need to write a complete or perfect parser. A parser that would fail under all sorts of real conditions can be good enough. As long as the parser can handle unquoted, space-separated items, you will probably be fine. And, as I’ve said, slogtest will catch bugs.

Last Thoughts

Maybe it would be easier if slogtest had two modes. In the first mode, slogtest would check everything: the standard attributes for time, level, and message, but also any further key=value pairs. In the second mode, slogtest would only check the key=value pairs. After looking at slogtest’s source, I think that this would be an easy change. (You would only need to split the test cases into two groups and adjust a small number of the test cases for the second mode.)

My guess, however, is that the Go maintainers would not accept a pull request or proposal like this. And I don’t blame them. For the most part, slogtest handlers will produce structured output that is (relatively) easy to parse. And in cases like humane and tint, users can write ugly parsers that are good enough for the job at hand.

Finally, even if you use slogtest, you will need to write additional tests. slogtest only checks whether a handler conforms to slog’s rules for handlers. It doesn’t check whether your handler does whatever distinguishes it from existing handlers.

Update (2023-08-17)

I was wrong: Go’s maintainers would consider a request to change slogtest, and someone has already made such a proposal. The proposal isn’t exactly what I had in mind, but it should make it easier to control slogest in ways that would help less structured handlers like humane and tint. After reading the discussion on GitHub, I suspect the proposal will be accepted. (Thanks to Jonathan Amsterdam, who told me about the proposal in a comment on Lobste.rs.)