To illustrate what is meant by "offline":
- Create a transfer-archive from a data source
- Copy the transfer-archive across a security boundary
- Unpack the transfer-archive to its final destination
Note that things are a bit more involved than the summary of the process – but this gives you the gist of the major effort-points.
The first time you do an offline bucket sync, transferring the entirety of a bucket is typically the goal. However, for a refresh-sync – particularly for a bucket of greater than a trivial content-size, this can be sub-ideal. For example, it might be necessary to do monthly syncs of a bucket that grows by a few Gigabytes per month. After a year, a full sync can mean having to move tens to hundreds of gigabytes. A better way is to only sync the deltas – copying only what's changed between the current and immediately-prior sync-tasks (a few GiB rather than tens to hundreds).
The AWS CLI tools don't really have a "sync only the files that have been added/modified since <DATE>". That said, it's not super difficult to work around that gap. A simple shell script like the following works a trick:
for FILE in $( aws s3 ls --recursive s3://<SOURCE_BUCKET>/ | \ awk '$1 > "2019-03-01 00:00:00" {print $4}' ) do echo "Downloading ${FILE}" install -bDm 000644 <( aws s3 cp "s3://<SOURCE_BUCKET>/${FILE}" - ) \ "<STAGING_DIR>/${FILE}" done
To explain the above:
- Create a list of files to iterate:
- Invoke a subprocess using the $() notation. Within that subprocess...
- Invoke the AWS CLI's S3 module to recursively list the source-bucket's contents (`aws s3 ls --recursive`)
- Pipe the output to `awk` – looking for any date-string that's newer than the value in s3 ls's first output-column (the file-modification date column) and print out only the fourth column (the S3 object-path)
- Use a for loop-method to iterate the previously-assembled list, assigning each S3 object-path to the ${FILE} variable
- Since I hate sending programs off to do things in silence (I don't trust them to not hang), my first looped-command is to say what's happening via the echo "Downloading ${FILE}" directive.
- The install line makes use of some niftiness within both BASH and the AWS CLI's S3 command:
- By specifying "-" as the "destination" for the file-copy operation, you tell the S3 command to write the fetched object-contents to STDOUT.
- BASH allows you take a stream of output and assign a file-handle to it by surrounding the output-producing command with <( ).
- Invoking the install command with the -D flag tells the command to "create all necessary path-elements to place the source 'file' in the desired location within the filesystem, even if none of the intervening directory structure exists, yet."