Fast, concurrent, streaming access to Amazon S3, including gof3r, a CLI. http://godoc.org/github.com/rlmcpherson/s3gof3r
Перейти к файлу
Michael Haggerty f36aee13dd putter.tryPut(): new method, extracted from `putter.Close()`
This makes the error handling simpler, while simultaneously fixing
some problems:

* `checkClosed()` did not work correctly, because it took its `err`
  parameter as a value rather than a reference.

* The use of `defer` in a loop means that the defers are not run until
  the end of the function, delaying the closure of up to five response
  bodies along with the corresponding error handling.

* A failure to close the response body was treated as an unretryable
  error, which was probably too strict.

* If all 5 retry attempts had failed, the old code would have
  needlessly fallen through to the MD5 sum checking code. I think this
  would have still ended up reported an error (?), but it's not
  obvious that it would always report the original error.
2021-09-08 08:02:17 +02:00
gof3r Switch to using go modules 2021-09-04 07:57:36 +02:00
.gitignore Merge branch 'master' into fix_at_colon 2017-02-09 19:40:37 -05:00
.travis.yml update travis config 2015-11-09 21:18:20 -05:00
LICENSE.txt Add MIT license 2013-11-18 09:03:04 +00:00
README.md update readme for GO15VENDOREXPERIMENT=1 2015-11-09 21:47:39 -05:00
auth.go Never call both `checkClose()` and `newRespError()` 2021-09-08 08:02:17 +02:00
auth_test.go Use AWS_SECURITY_TOKEN environment variable 2016-02-10 12:28:59 +11:00
delete_multiple.go deleteMultiple(): don't lose `resp.Body.Close()` errors 2021-09-08 08:02:17 +02:00
getter.go getter.getChunk(): don't call `resp.Body.Close()` multiple times 2021-09-08 08:02:17 +02:00
getter_test.go Fix hang when retries fail. 2020-06-12 08:43:55 -05:00
go.mod Switch to using go modules 2021-09-04 07:57:36 +02:00
go.sum Switch to using go modules 2021-09-04 07:57:36 +02:00
http_client.go 0.4.0 Release. 2014-06-27 19:28:00 -04:00
list_objects.go Never call both `checkClose()` and `newRespError()` 2021-09-08 08:02:17 +02:00
list_objects_test.go Add a function call for listing objects under a prefix 2017-06-26 08:24:44 +02:00
pool.go getter slice pool. remove old pool code. 2014-10-06 23:46:55 -04:00
pool_test.go exclude pool_test from race detector testing 2015-03-11 18:55:23 -04:00
putter.go putter.tryPut(): new method, extracted from `putter.Close()` 2021-09-08 08:02:17 +02:00
s3gof3r.go Bucket.delete(): don't lose `resp.Body.Close()` errors 2021-09-08 08:02:17 +02:00
s3gof3r_test.go Add support for the multi-delete endpoint 2017-03-30 18:20:39 +02:00
sign.go Allow ampersands and colons in key names 2016-11-28 16:07:43 +01:00
sign_test.go Sign request using AWS v4 signature 2015-10-29 18:50:55 -07:00
util.go newRespError(): make it clear that we're intentionally ignoring errors 2021-09-08 08:02:17 +02:00

README.md

s3gof3r Build Status GoDoc

s3gof3r provides fast, parallelized, pipelined streaming access to Amazon S3. It includes a command-line interface: gof3r.

It is optimized for high speed transfer of large objects into and out of Amazon S3. Streaming support allows for usage like:

  $ tar -czf - <my_dir/> | gof3r put -b <s3_bucket> -k <s3_object>    
  $ gof3r get -b <s3_bucket> -k <s3_object> | tar -zx

Speed Benchmarks

On an EC2 instance, gof3r can exceed 1 Gbps for both puts and gets:

  $ gof3r get -b test-bucket -k 8_GB_tar | pv -a | tar -x
  Duration: 53.201632211s
  [ 167MB/s]
  

  $ tar -cf - test_dir/ | pv -a | gof3r put -b test-bucket -k 8_GB_tar
  Duration: 1m16.080800315s
  [ 119MB/s]

These tests were performed on an m1.xlarge EC2 instance with a virtualized 1 Gigabit ethernet interface. See Amazon EC2 Instance Details for more information.

Features

  • Speed: Especially for larger s3 objects where parallelism can be exploited, s3gof3r will saturate the bandwidth of an EC2 instance. See the Benchmarks above.

  • Streaming Uploads and Downloads: As the above examples illustrate, streaming allows the gof3r command-line tool to be used with linux/unix pipes. This allows transformation of the data in parallel as it is uploaded or downloaded from S3.

  • End-to-end Integrity Checking: s3gof3r calculates the md5 hash of the stream in parallel while uploading and downloading. On upload, a file containing the md5 hash is saved in s3. This is checked against the calculated md5 on download. On upload, the content-md5 of each part is calculated and sent with the header to be checked by AWS. s3gof3r also checks the 'hash of hashes' returned by S3 in the Etag field on completion of a multipart upload. See the S3 API Reference for details.

  • Retry Everything: All http requests and every part is retried on both uploads and downloads. Requests to S3 frequently time out, especially under high load, so this is essential to complete large uploads or downloads.

  • Memory Efficiency: Memory used to upload and download parts is recycled. For an upload or download with the default concurrency of 10 and part size of 20 MB, the maximum memory usage is less than 300 MB. Memory footprint can be further reduced by reducing part size or concurrency.

Installation

s3gof3r is written in Go and requires go 1.5 or later. It can be installed with go get to download and compile it from source. To install the command-line tool, gof3r set GO15VENDOREXPERIMENT=1 in your environment:

$ go get github.com/rlmcpherson/s3gof3r/gof3r

To install just the package for use in other Go programs:

$ go get github.com/rlmcpherson/s3gof3r

Release Binaries

To try the latest release of the gof3r command-line interface without installing go, download the statically-linked binary for your architecture from Github Releases.

gof3r (command-line interface) usage:

  To stream up to S3:
     $  <input_stream> | gof3r put -b <bucket> -k <s3_path>
  To stream down from S3:
     $ gof3r get -b <bucket> -k <s3_path> | <output_stream>
  To upload a file to S3:
     $ $ gof3r cp <local_path> s3://<bucket>/<s3_path>
  To download a file from S3:
     $ gof3r cp s3://<bucket>/<s3_path> <local_path>

Set AWS keys as environment Variables:

  $ export AWS_ACCESS_KEY_ID=<access_key>
  $ export AWS_SECRET_ACCESS_KEY=<secret_key>

gof3r also supports IAM role-based keys from EC2 instance metadata. If available and environment variables are not set, these keys are used are used automatically.

Examples:

$ tar -cf - /foo_dir/ | gof3r put -b my_s3_bucket -k bar_dir/s3_object -m x-amz-meta-custom-metadata:abc123 -m x-amz-server-side-encryption:AES256
$ gof3r get -b my_s3_bucket -k bar_dir/s3_object | tar -x    

see the gof3r man page for complete usage

Documentation

s3gof3r package: See the godocs for api documentation.

gof3r cli : godoc and gof3r man page

Have a question? Ask it on the s3gof3r Mailing List