Technical April 10, 2024

Why We Moved SSH Session Recording to the Edge

A technical look at why proxy-based session recording is a flawed architecture for modern infrastructure, and how we implemented node-mode recording at the PTY level in Go.

The Bastion Bottleneck

When building an SSH access platform for compliance-heavy environments, session recording is an absolute requirement. You need an immutable, auditable trail of every command executed and every byte of output returned during an SSH session.

The traditional architectural approach to this problem is proxy-based recording. The user connects to a central bastion host, the bastion establishes a second connection to the target node, and the bastion records the bytes flowing between them.

For Mezite, we initially started down this path. Our architecture consists of a central Auth/Proxy service (mezhub) and lightweight agents (mezd) running on target nodes. The proxy routes connections via reverse tunnels. It seemed natural to intercept and record the traffic at the proxy layer.

But as we built it, we realized that proxy-side recording is fundamentally flawed for a modern, scalable access platform.

The Flaws of Proxy-Based Recording

1. The Protocol Framing Problem

If your proxy simply acts as a transparent TCP relay—shuttling bytes from the client to the agent’s reverse tunnel—the data stream you intercept is not a clean terminal session. It is an encrypted, multiplexed SSH protocol stream.

Even if the proxy terminates the outer TLS/SSH transport to inspect the payload, what it sees are SSH channel data messages, window dimension adjustments, keep-alives, and global requests. If you record this raw protocol framing to disk, playing it back in a web browser requires a full SSH protocol parser in JavaScript. It is incredibly brittle.

2. The “Heavy Proxy” Anti-Pattern

To get clean terminal I/O (just the ANSI escape sequences, stdout, and stderr) at the proxy layer, the proxy must become a full “Man-in-the-Middle” SSH server.

It must terminate the client’s SSH session, allocate pseudo-terminals, parse PTY requests, and then act as an SSH client to establish a brand new session to the target node.

This requires the proxy to decrypt, buffer, process, and re-encrypt every single byte of traffic. The proxy becomes a massive CPU and memory bottleneck. It breaks end-to-end encryption between the user and the node, and it adds significant latency to interactive sessions.

Our Decision: Node-Mode Recording

We made a deliberate architectural decision to push session recording to the edge. Instead of recording at the proxy, we record directly on the target node inside the Mezite agent (mezd).

By pushing recording to the edge, the proxy remains a “dumb,” high-throughput router. It blindly shuttles encrypted packets over the reverse tunnel.

Capturing at the PTY Level

When a user initiates an interactive SSH session, the agent’s SSH server receives a pty-req containing the requested terminal modes and window dimensions.

Instead of dealing with SSH channels, we intercept the I/O at the operating system level. We use the creack/pty Go library to allocate a real pseudo-terminal (PTMX/PTS pair).

server/agent/ssh_server.go (Simplified) go

// Parse the PTY request payload
w, h := parseDimensions(payload)

// Start the shell command with a real PTY
cmd := exec.Command("bash")
cmd.Env = append(os.Environ(), "TERM=xterm-256color")

ptmx, err := pty.StartWithSize(cmd, &pty.Winsize{Rows: h, Cols: w})
if err != nil {
  return err
}
defer ptmx.Close()

// Copy PTY output -> channel, recording output chunks simultaneously
go func() {
  buf := make([]byte, 32*1024)
  for {
      n, readErr := ptmx.Read(buf)
      if n > 0 {
          _, _ = channel.Write(buf[:n])
          if rec != nil {
              _ = rec.WriteChunk(session.DirOutput, buf[:n])
          }
      }
      if readErr != nil {
          break
      }
  }
}()

Because we are capturing data directly from the PTY file descriptor before it is framed into SSH channel messages, the recorded output is a perfectly clean stream of terminal data. It contains the raw ANSI escape codes and text, which is trivial to stream to an xterm.js frontend in our web UI.

Handling Window Changes

When the user resizes their terminal window, their SSH client sends a window-change request to the server. Since our agent controls the PTY, handling this is as simple as passing the new dimensions directly to the PTY via pty.Setsize.

server/agent/ssh_server.go (Simplified) go

// Handle window-change requests while the command runs
go func() {
  for req := range requests {
      if req.Type == "window-change" {
          ws := parseWindowChange(req.Payload)
          if ws != nil {
              _ = pty.Setsize(ptmx, ws)
          }
          if req.WantReply {
              _ = req.Reply(true, nil)
          }
      }
  }
}()

Encrypted at the Source

Recording at the edge allows us to implement strong security guarantees. The agent encrypts the recording chunks in memory using AES-256-GCM before writing them to disk or streaming them out.

When the session ends, the agent securely uploads the encrypted recording to the Auth server (or directly to an S3/MinIO bucket). The central proxy never sees the plaintext terminal output, preserving end-to-end encryption for the session data while still satisfying compliance requirements.

What About Agentless Nodes?

There is one exception: Agentless OpenSSH nodes.

Mezite supports connecting to standard OpenSSH servers using ephemeral CA-signed certificates, without requiring the Mezite agent. In this scenario, we cannot run our node-mode recording logic.

For these nodes, we do fall back to proxy-mode recording. The proxy must terminate the session and act as a full bastion. However, we treat this as a “best-effort” fallback. The architectural tradeoff is explicit: if you want high-throughput, end-to-end encrypted, clean session recording, you deploy the agent. If you cannot deploy the agent, you accept the proxy overhead.

Conclusion

By moving session recording to the edge, we stripped massive complexity out of our control plane. The proxy doesn’t need to parse PTY requests, handle window size changes, or buffer terminal output. It just routes packets.

The result is a significantly faster SSH experience for users, a more scalable control plane for administrators, and a cleaner audit trail for compliance teams.

Mezite Team

Engineering