AWS S3 Select enables the convenient exploration of a single S3 object using SQL queries. Unlike in our examples with Redshift and Redshift Spectrum (both of which gave access to all S3 objects with a common key prefix), S3 Select will only let you analyze a single object at a time.

A key benefit of S3 Select is that it works without having to configure any services like Redshift. Also because the query is executed by S3, your client only downloads the results of the query, not the entire object. This is ideal for when network bandwidth is limited, and can help minimize transit costs.

In the following example, the AWS command line client is used to run an S3 Select query. Before doing this yourself, you'll need to install the AWS CLI, and ensure that your system has the credentials necessary for accessing your S3 bucket. Replace 'my-bucket' with your bucket name, and the example key with the key for the object you wish to inspect. The results of the query save in the file ./result1.json (defaults to newline-delimited JSON).

aws s3api select-object-content \
    --bucket my-bucket \
    --key cmd_export/cmd-v1/CMP-XXX/PRJ-YYY/2018/08/20/uw1_cmd-v1_PRJ-YYY_000_000000000024074.json.gz \
    --expression-type SQL \
    --expression "select * FROM S3Object s WHERE s.exec_path='/bin/rm'" \
    --input-serialization '{"CompressionType":"gzip","JSON": {"Type":"LINES"}}' \
    --output-serialization 'JSON={}' \
Did this answer your question?