This page explains the structure of Cmd data, and how it appears when you export it for additional analysis or long-term storage. The full guide is available as a PDF with descriptions of each JSON field, as well as important background information on the data included with each event, such as the event's Linux process context and TTY device major and minor numbers.

Introduction

Cmd can export recorded Linux events as newline-delimited JSON to objects in your S3 and GCS buckets. These JSON objects have a common subset of fields, most notably event_type . The value of event_type may be either EXEC, BUILTIN or BACKFILL, and defines what additional fields may be present in the object.

This document starts with a conceptual overview, followed by sections that describe the JSON object fields of the EXEC, BUILTIN and BACKFILL events.

Note: In the future, Cmd may add fields to the schema. Please ensure your data processing tools are configured to ignore unknown fields.

Cmd Audit and Cmd Control

This data schema applies to both the Cmd Control and Cmd Audit agents, with some caveats:

  • The Cmd Audit agent does not record BUILTIN events.

  • Cmd Control agent versions before v1.4.0 use the legacy data export schema.

Events: Linux process context

Many exported events contain information about their Linux process context. This valuable information allows you to determine who really created a process, regardless of local changes of user identity using sudo or su. It includes each process' controlling terminal, as well as whether its stdin , stdout , and stderr are tied to that controlling terminal. This makes it easier to differentiate between interactive processes started by humans and those started by services such as web servers.

There is a delicate balance between including too much or too little information with each event. More information about each event simplifies the processing of data from S3 and GCS because little or no information from other events is required to process each event; context is self-contained. Less information in the event means lower bandwidth requirements, RAM, and persistent storage usage.

Our solution is to include information in each event about several important related processes — but not about its entire process ancestry. We include information about the:

  • first connected ("inception") session process;

  • session leader process;

  • last known user-entered process;

  • parent process; and,

  • the current (“self”) process, in which the event occurred.

Other than parent and self, these are detailed below.

Linux process model: Foundational concepts

Linux processes have hierarchical ancestry, wherein each process is created by a single parent process (using the fork() or clone() system calls). Much information is inherited from the parent process, including open file descriptors (e.g. stdin, stdout, stderr) and the Linux user ID. A session leader is the oldest ancestor process in a set of related processes such as an SSH session. You can identify session leaders by their having identical process IDs (PIDs) and session IDs (SIDs), and processes within a session have the SID value of that session’s session leader. Note that Linux servers can reuse PIDs over time (though not at the same time), so do not rely on the PID or SID alone to correlate events. Instead use the process UUIDs, which are generated by Cmd, and take into account boot ID, process start time, and other information to ensure they are unique. Much more can be learned about process hierarchy by running ps axjf in a Linux shell.

Process: “Inception session”

The inception session is the process responsible for a user's first entry onto a server, where credentials are exchanged. For example, a Bash shell that results from logging in with SSH, AWS SSM, a serial terminal, or a console, is an inception session. When shared Linux users such as "ubuntu" or "ec2-user" are not in use, each inception session process is reliably associated with a user, based on their server login credentials. Even with shared users enabled, if you use Cmd to require MFA after login, Cmd will provide information about the actual user and their roles (in the cmd_user and cmd_roles fields described below). There is one sub-category of inception session processes: internal inception sessions. These represent services, typically ones started when the server boots, such as web servers, databases, sshd, etc. Specifically, these are the processes started by the init process (PID 1, typically systemd). You can differentiate between internal and external inception sessions by looking at their parent PIDs, ancestor sessions, and whether they are interactive (have a controlling terminal).

Process: “Session leader”

A session leader is the process that starts a session. Typically, the inception session for an event is also its session leader. Exceptions to this rule can occur due to terminal multiplexers like tmux or screen , which call setsid() . When you enter a multiplexed session, it does not affect the inception session, but changes the session leader for events from that session to the multiplexed session itself. Multiplexed shell sessions do not go away if you log out of your inception session. System administrators often use them when facing network instability; they can reconnect and reattach to the multiplexed session, and Cmd will still associate the events from the multiplexed session with the same inception session.

Process: “Last known user-entered”

The last known user-entered process (LKUEP) is an estimation of the most recent ancestor of the event's self process that was initiated by an external human user. Previously, Cmd determined whether a process was user-entered based on bash-specific information, but now uses information that works across more shells (ksh, zsh, etc). A user-entered process is defined by these criteria:

  1. Its parent process' stdin is reading from the controlling terminal (i.e. user could have entered ls ).

  2. Its parent process' stderr is writing to the controlling terminal.

  3. Its PGID differs from its parent process’ PGID.

Note that when a shell runs programs entered by users, for example ls , the shell reads what the user typed from its controlling terminal ( ls and return) and creates a child process for ls that is in a new process group (with a new PGID). Programs in pipelines such as cat foo.txt | grep bar | wc -l all share the same process group. Therefore each program — cat , grep , and wc — will be the LKUEP for their descendant processes, until one of those descendants matches the above criteria. Even if a process matches the criteria, it will not become its own LKUEP.

To view the process tree in a terminal, you can use ps ajxf , and to check on the stdin and stderr of any processes, you can use lsof -p <pid1>,..,<pidN> | grep -E '(0u|2u)' . Also note that an event’s LKUEP and parent process executable basename correspond to the cmd_parent_cmd_root CQL value (dual valued).

Example: Processes associated with EXEC events

The inception session, session leader, parent, and self processes are present in all exported events, even if some correspond to the same process. For example, consider the case where you login with SSH and in your login shell execute ls . For the EXEC event associated with your login shell, your login shell is both the self process and the inception session, the parent process is sshd, and the session leader is sshd’s session leader. For the EXEC event associated with ls , the inception session process is the login shell, the parent process is the login shell, the session leader is the login shell and the self process is ls . In rare cases, depending on the sensor technology and its configuration, a process may be absent from the event because it could not be captured quickly enough (e.g. a very short lived parent process).

TTY Device Major and Minor Numbers

The Linux process information in exported events contains device "major" and "minor" numbers for the controlling terminal, standard input (stdin), standard output (stdout) and standard error (stderr) file descriptors. This section describes how to interpret this information.

Controlling terminals ensure that control-Z sends a STOP signal to the foreground process group, determine whether user input is echoed on screen, and perform other tasks related to the user interface. Rather than requiring all programs that can start interactive sessions (such as sshd) to re-implement this logic, the Linux kernel offers TTY devices so the logic can be shared. These devices have a major and minor number to identify them.

Controlling terminals in Linux are typically one of the following:

  • Pseudo terminal:
    Session input comes from over the network, for example from SSH or SSM.

  • Serial terminal:
    Session input comes from a UART/serial chip on the motherboard or virtual hardware.

  • Virtual console:
    Session input comes from a keyboard device (USB, PS2, virtual, etc).

  • None:
    Services do not need controlling terminals.

Major and Minor Number Ranges for controlling terminals

The device major and minor numbers associated with a process' stdin, stdout, and stderr are included in exported events. They will match the numbers above if they are bound to a controlling terminal. In cases where they are associated with a file, the device major and minor numbers will be those of the block device (disk) and partition where that file resides, e.g. "8,1".

The full list of Linux devices can be found here.

View a complete list of the fields in EXEC, BUILTIN, and BACKFILL events here.

Did this answer your question?