AWS Redshift is a cloud-hosted, 'big data' relational database that can handle huge amounts of data. You can import S3 data into Redshift with a few commands, enabling you to query it. This article describes how to import S3 data to Redshift.
How to access exported data
Create a Redshift cluster if necessary. (To learn how, see Amazon's documentation.)
Ensure that the cluster has permission to read from the S3 bucket containing Cmd data. (To learn how, see Amazon's documentation.)
Redshift is based on Postgres, so it's also accessed with the JDBC drivers and clients used with Postgres. Run the following commands in 'psql', replacing the bucket name and iam_role arn:
psql -h redshift-cluster-1.some-obfuscated-name.us-west-1.redshift.amazonaws.com -u my-awsuser -d dev -p 5439
(To learn more about connecting to Redshift via psql, see Amazon's documentation.)
CREATE TABLE IF NOT EXISTS project_YYY_commands (
id VARCHAR(48) NOT NULL,
created TIMESTAMP NOT NULL SORTKEY DISTKEY,
exec_user VARCHAR(64),
command VARCHAR(65535),
command_line VARCHAR(65535),
command_type VARCHAR(32),
exec_path VARCHAR(4096),
cwd VARCHAR(4096),
pid INT,
parent_pid INT,
origin VARCHAR(32),
parent_command_base VARCHAR(255),
top_command_base VARCHAR(255),
shell_type VARCHAR(32),
command_number INT,
interactive_shell BOOL,
executing BOOL,
line_number INT,
shell_login INT,
startup_state INT,
shell_version VARCHAR(128),
parent_id VARCHAR(48),
session_id VARCHAR(48),
server_id VARCHAR(48),
project_id VARCHAR(48),
company_id VARCHAR(48)
) DISTSTYLE KEY;
CREATE TABLE IF NOT EXISTS project_YYY_sessions (
id VARCHAR(48) NOT NULL,
created TIMESTAMP NOT NULL SORTKEY DISTKEY,
login_user VARCHAR(64),
command VARCHAR(255),
how_connected VARCHAR(32),
interactive bool,
pid INT,
ip VARCHAR(64),
tz_std VARCHAR(128),
tz_dst VARCHAR(128),
utc_offset INT,
has_dst BOOL,
server_name VARCHAR(255),
server_id VARCHAR(48),
project_id VARCHAR(48),
company_id VARCHAR(48),
cmd_agent_version VARCHAR(48)
) DISTSTYLE KEY;
copy project_YYY_commands FROM 's3://my-bucket/cmd_export/cmd-v1/CMP-XXX/PRJ-YYY/' iam_role 'arn:aws:iam::939393939393:role/my-redshift-role' json 'auto' gzip timeformat 'auto';
copy project_YYY_sessions FROM 's3://my-bucket/cmd_export/sess-v1/CMP-XXX/PRJ-YYY/' iam_role 'arn:aws:iam::9939393939393:role/my-redshift-role' json 'auto' gzip timeformat 'auto';
vacuum;
analyze;
Once the two tables are created, and the two copy commands have loaded them with data, queries like the following can be executed:
SELECT s.ip,s.how_connected,c.* FROM project_YYY_commands c join project_YYY_sessions s on c.session_id=s.id where c.exec_path='/bin/su';
-[ RECORD 1 ]-------+-------------------------------------
ip | 10.14.82.130
how_connected | sshd
id | fdfb748b-2be2-458d-be04-0ebd3e77d7e8
created | 2018-08-22 22:05:16.614363
exec_user | root
command | su
command_line |
command_type | EXEC
exec_path | /bin/su
cwd | /home/peterson
pid | 3223
parent_pid | 3222
origin | shell_uec
parent_command_base | sudo
top_command_base | sudo
shell_type |
command_number | 0
interactive_shell | f
executing | f
line_number | 0
shell_login | 0
startup_state | 0
shell_version |
parent_id |
session_id | 745fe88f-da03-44df-ba14-0be538ad5be7
server_id | SVR-MG
project_id | PRJ-YYY
company_id | CMP-XXX