Skip to content

Commit 2543bde

Browse files
committed
Add Google Cloud Storage (GCS) backend support
- Implement GCS backend with full Backend interface support - Add GCS configuration flags and environment variables - Include integration tests for GCS backend - Document GCS usage, authentication, and Anywhere Cache setup - Update README with GCS performance guidance and best practices GCS backend provides a GCP-native alternative to S3, with optional Anywhere Cache support for improved read performance.
1 parent 42e4d0f commit 2543bde

6 files changed

Lines changed: 791 additions & 20 deletions

File tree

README.md

Lines changed: 161 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
Effectively, `gobuildcache` leverages S3OZ as a distributed build cache for concurrent `go build` or `go test` processes regardless of whether they're running on a single machine or distributed across a fleet of CI VMs. This dramatically improves CI performance for large Go repositories because each CI process will behave as if running with an almost completely pre-populated build cache, even if the CI process was started on a completely ephemeral VM that has never compiled code or executed tests for the repository before.
2323

24-
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
24+
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region (and ideally same availability zone) as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris or Google Cloud Storage (GCS). For GCP users, enabling GCS Anywhere Cache can provide performance similar to S3OZ for read-heavy workloads. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
2525

2626
# Quick Start
2727

@@ -41,7 +41,9 @@ go test ./...
4141

4242
By default, `gobuildcache` uses an on-disk cache stored in the OS default temporary directory. This is useful for testing and experimentation with `gobuildcache`, but provides no benefits over the Go compiler's built-in cache, which also stores cached data locally on disk.
4343

44-
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, or a similarly low latency distributed backend.
44+
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, Google Cloud Storage, or a similarly low latency distributed backend.
45+
46+
### Using S3
4547

4648
```bash
4749
export GOBUILDCACHE_BACKEND_TYPE=s3
@@ -63,6 +65,86 @@ go test ./...
6365

6466
> **Note**: All configuration environment variables support both `GOBUILDCACHE_<KEY>` and `<KEY>` forms (e.g., both `GOBUILDCACHE_S3_BUCKET` and `S3_BUCKET` work). The prefixed version takes precedence if both are set. The prefixed form is recommended to avoid conflicts with other tools. If the prefixed variable is set to an empty string, it falls through to the unprefixed version (or default).
6567
68+
### Using Google Cloud Storage (GCS)
69+
70+
```bash
71+
export GOBUILDCACHE_BACKEND_TYPE=gcs
72+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
73+
```
74+
75+
GCS authentication uses Application Default Credentials. You can provide credentials in one of the following ways:
76+
77+
1. **Service Account JSON file** (recommended for CI):
78+
```bash
79+
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json
80+
export GOCACHEPROG=gobuildcache
81+
export GOBUILDCACHE_BACKEND_TYPE=gcs
82+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
83+
go build ./...
84+
go test ./...
85+
```
86+
87+
2. **Metadata service** (when running on GCP):
88+
```bash
89+
# No credentials file needed - uses metadata service automatically
90+
export GOCACHEPROG=gobuildcache
91+
export GOBUILDCACHE_BACKEND_TYPE=gcs
92+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
93+
go build ./...
94+
go test ./...
95+
```
96+
97+
3. **gcloud CLI credentials** (for local development):
98+
```bash
99+
gcloud auth application-default login
100+
export GOCACHEPROG=gobuildcache
101+
export GOBUILDCACHE_BACKEND_TYPE=gcs
102+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
103+
go build ./...
104+
go test ./...
105+
```
106+
107+
#### GCS Anywhere Cache (Recommended for Performance)
108+
109+
For improved performance, especially in read-heavy workloads, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache). Anywhere Cache provides an SSD-backed zonal read cache that can significantly reduce latency for frequently accessed cache objects.
110+
111+
**Benefits:**
112+
- **Lower read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
113+
- **Reduced costs**: Lower data transfer costs, especially for multi-region buckets, and reduced retrieval fees
114+
- **Better performance**: Especially beneficial when multiple CI jobs access the same cached artifacts
115+
- **Automatic scaling**: Cache capacity and bandwidth scale automatically based on usage
116+
117+
**Requirements:**
118+
- Bucket must be in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
119+
- CI runners should be in the same zone as the cache for optimal performance
120+
- Anywhere Cache is most effective for read-heavy workloads with high cache hit ratios
121+
122+
**Setup:**
123+
1. Verify your bucket region/zone supports Anywhere Cache
124+
2. Enable Anywhere Cache on your GCS bucket
125+
3. Configure the cache in the same zone as your CI runners for best performance
126+
4. Set admission policy to "First miss" for faster warm-up (caches on first access)
127+
5. Configure TTL based on your needs (1 hour to 7 days, default 24 hours)
128+
129+
```bash
130+
# Enable Anywhere Cache using gcloud CLI
131+
# Replace ZONE_NAME with the zone where your CI runners are located
132+
gcloud storage buckets update gs://YOUR_BUCKET_NAME \
133+
--enable-anywhere-cache \
134+
--anywhere-cache-zone=ZONE_NAME \
135+
--anywhere-cache-admission-policy=FIRST_MISS \
136+
--anywhere-cache-ttl=7d
137+
```
138+
139+
**Note:**
140+
- Anywhere Cache only accelerates reads. Writes still go directly to the bucket, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance.
141+
- First-time access to an object will still hit the bucket (cache miss), but subsequent reads will be served from the cache.
142+
- For best results, ensure your CI runners and cache are in the same zone.
143+
144+
For more details, including availability by region, see the [GCS Anywhere Cache documentation](https://cloud.google.com/storage/docs/anywhere-cache).
145+
146+
#### AWS Credentials Permissions
147+
66148
Your credentials must have the following permissions:
67149

68150
```json
@@ -97,15 +179,36 @@ Your credentials must have the following permissions:
97179
}
98180
```
99181

182+
#### GCS Credentials Permissions
183+
184+
Your GCS service account must have the following IAM roles or permissions:
185+
186+
- `storage.objects.create` - to upload cache objects
187+
- `storage.objects.get` - to download cache objects
188+
- `storage.objects.delete` - to delete cache objects (for clearing)
189+
- `storage.objects.list` - to list objects (for clearing)
190+
191+
The simplest way is to grant the `Storage Object Admin` role to your service account:
192+
193+
```bash
194+
gcloud projects add-iam-policy-binding PROJECT_ID \
195+
--member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
196+
--role="roles/storage.objectAdmin"
197+
```
198+
199+
Or for more granular control, create a custom role with only the required permissions.
200+
100201
## Github Actions Example
101202

102203
See the `examples` directory for examples of how to use `gobuildcache` in a Github Actions workflow.
103204

104-
## S3 Lifecycle Policy
205+
## Lifecycle Policies
105206

106-
It's recommended to configure a lifecycle policy on your S3 bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
207+
It's recommended to configure a lifecycle policy on your storage bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
107208

108-
Here's a sample lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
209+
### S3 Lifecycle Policy
210+
211+
Here's a sample S3 lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
109212

110213
```json
111214
{
@@ -127,6 +230,28 @@ Here's a sample lifecycle policy that expires objects after 7 days and aborts in
127230
}
128231
```
129232

233+
### GCS Lifecycle Policy
234+
235+
For GCS, you can configure a lifecycle policy using `gsutil` or the GCP Console. Here's an example using `gsutil` that expires objects after 7 days:
236+
237+
```bash
238+
gsutil lifecycle set - <<EOF
239+
{
240+
"lifecycle": {
241+
"rule": [
242+
{
243+
"action": {"type": "Delete"},
244+
"condition": {"age": 7}
245+
}
246+
]
247+
}
248+
}
249+
EOF
250+
gsutil lifecycle set - gs://YOUR_BUCKET_NAME
251+
```
252+
253+
Or using the GCP Console, navigate to your bucket → Lifecycle → Add a rule → Set condition to "Age" of 7 days → Action to "Delete".
254+
130255
# Preventing Cache Bloat
131256

132257
`gobuildcache` performs zero automatic GC or trimming of the local filesystem cache or the remote cache backend. Therefore, it is recommended that you run your CI on VMs with ephemeral storage and do not persist storage between CI runs. In addition, you should ensure that your remote cache backend has a lifecycle policy configured like the one described in the previous section.
@@ -141,7 +266,7 @@ gobuildcache clear-local
141266
gobuildcache clear-remote
142267
```
143268

144-
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable to the `clear-remote` command.
269+
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable (or `gcs-bucket`/`GCS_BUCKET` for GCS) to the `clear-remote` command.
145270

146271
# Configuration
147272

@@ -151,12 +276,14 @@ All environment variables support both `GOBUILDCACHE_<KEY>` and `<KEY>` forms (e
151276

152277
| Flag | Environment Variable | Default | Description |
153278
|------|----------------------|---------|-------------|
154-
| `-backend` | `GOBUILDCACHE_BACKEND_TYPE` | `disk` | Backend type: `disk` or `s3` |
279+
| `-backend` | `GOBUILDCACHE_BACKEND_TYPE` | `disk` | Backend type: `disk`, `s3`, or `gcs` |
155280
| `-lock-type` | `GOBUILDCACHE_LOCK_TYPE` | `fslock` | Locking: `fslock` or `memory` |
156281
| `-cache-dir` | `GOBUILDCACHE_CACHE_DIR` | `$TMPDIR/gobuildcache/cache` | Local cache directory |
157282
| `-lock-dir` | `GOBUILDCACHE_LOCK_DIR` | `$TMPDIR/gobuildcache/locks` | Filesystem lock directory |
158283
| `-s3-bucket` | `GOBUILDCACHE_S3_BUCKET` | (none) | S3 bucket name (required for S3) |
159284
| `-s3-prefix` | `GOBUILDCACHE_S3_PREFIX` | (empty) | S3 key prefix |
285+
| `-gcs-bucket` | `GOBUILDCACHE_GCS_BUCKET` | (none) | GCS bucket name (required for GCS) |
286+
| `-gcs-prefix` | `GOBUILDCACHE_GCS_PREFIX` | (empty) | GCS object prefix |
160287
| `-debug` | `GOBUILDCACHE_DEBUG` | `false` | Enable debug logging |
161288
| `-stats` | `GOBUILDCACHE_PRINT_STATS` | `false` | Print cache statistics on exit |
162289
| `-read-only` | `GOBUILDCACHE_READ_ONLY` | `false` | Read-only mode: allow cache reads but skip writes |
@@ -175,6 +302,7 @@ graph TB
175302
GBC -->|2. reads/writes| LFS[Local Filesystem Cache]
176303
GBC -->|3. GET/PUT| Backend{Backend Type}
177304
Backend --> S3OZ[S3 Express One Zone]
305+
Backend --> GCS[Google Cloud Storage]
178306
```
179307

180308
## Processing `GET` commands
@@ -280,4 +408,29 @@ Yes, but the latency of regular S3 is 10-20x higher than S3OZ, which undermines
280408

281409
## Do I have to use `gobuildcache` with self-hosted runners in AWS and S3OZ?
282410

283-
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, or Tigris as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend using self-hosted CI running in AWS and targeting a S3OZ bucket in the same region as your CI runners.
411+
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, Tigris, or Google Cloud Storage as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend:
412+
413+
- **AWS**: Self-hosted CI running in AWS targeting a S3OZ bucket in the same region (and ideally same availability zone) as your CI runners
414+
- **GCP**: Self-hosted CI running in GCP targeting a GCS Regional Standard bucket in the same region as your CI runners. For even better performance, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) to get zonal read caching.
415+
416+
## Can I use Google Cloud Storage instead of S3?
417+
418+
Yes! `gobuildcache` supports Google Cloud Storage (GCS) as a backend. GCS is a good alternative to S3, especially if you're already using GCP infrastructure.
419+
420+
**Performance Considerations:**
421+
422+
- **Standard GCS**: While GCS doesn't have an exact equivalent to S3 Express One Zone's single-AZ storage, using GCS Regional Standard buckets in the same region as your compute provides good performance.
423+
424+
- **GCS with Anywhere Cache** (Recommended): For read-heavy workloads like build caches, enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) can significantly improve performance:
425+
- **Read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
426+
- **Cost savings**: Reduced data transfer costs and lower read operation costs
427+
- **Best for**: Workloads where the same cache objects are accessed multiple times (common in CI where multiple jobs may access the same artifacts)
428+
429+
Anywhere Cache is particularly effective when:
430+
- Your CI runners are in the same zone as the cache
431+
- You have high cache hit ratios (same objects accessed repeatedly)
432+
- Your bucket is in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
433+
434+
- **Write latency**: GCS write latency may be higher than S3OZ, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance significantly.
435+
436+
**Recommendation**: If you're using GCP and want performance closer to S3OZ, use GCS Regional Standard buckets with Anywhere Cache enabled in the same zone as your CI runners. This provides excellent read performance while maintaining better durability than single-AZ storage.

go.mod

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,21 @@ module github.com/richardartoul/gobuildcache
33
go 1.25
44

55
require (
6+
cloud.google.com/go/storage v1.40.0
67
github.com/DataDog/sketches-go v1.4.6
78
github.com/aws/aws-sdk-go-v2 v1.32.7
89
github.com/aws/aws-sdk-go-v2/config v1.28.7
910
github.com/aws/aws-sdk-go-v2/service/s3 v1.71.1
1011
github.com/gofrs/flock v0.13.0
1112
github.com/pierrec/lz4/v4 v4.1.23
13+
google.golang.org/api v0.170.0
1214
)
1315

1416
require (
17+
cloud.google.com/go v0.112.1 // indirect
18+
cloud.google.com/go/compute v1.24.0 // indirect
19+
cloud.google.com/go/compute/metadata v0.2.3 // indirect
20+
cloud.google.com/go/iam v1.1.7 // indirect
1521
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.7 // indirect
1622
github.com/aws/aws-sdk-go-v2/credentials v1.17.48 // indirect
1723
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.22 // indirect
@@ -27,6 +33,32 @@ require (
2733
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.7 // indirect
2834
github.com/aws/aws-sdk-go-v2/service/sts v1.33.3 // indirect
2935
github.com/aws/smithy-go v1.22.1 // indirect
36+
github.com/felixge/httpsnoop v1.0.4 // indirect
37+
github.com/go-logr/logr v1.4.1 // indirect
38+
github.com/go-logr/stdr v1.2.2 // indirect
39+
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
40+
github.com/golang/protobuf v1.5.4 // indirect
41+
github.com/google/s2a-go v0.1.7 // indirect
42+
github.com/google/uuid v1.6.0 // indirect
43+
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
44+
github.com/googleapis/gax-go/v2 v2.12.3 // indirect
45+
go.opencensus.io v0.24.0 // indirect
46+
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.49.0 // indirect
47+
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
48+
go.opentelemetry.io/otel v1.24.0 // indirect
49+
go.opentelemetry.io/otel/metric v1.24.0 // indirect
50+
go.opentelemetry.io/otel/trace v1.24.0 // indirect
51+
golang.org/x/crypto v0.21.0 // indirect
52+
golang.org/x/net v0.22.0 // indirect
53+
golang.org/x/oauth2 v0.18.0 // indirect
54+
golang.org/x/sync v0.6.0 // indirect
3055
golang.org/x/sys v0.37.0 // indirect
31-
google.golang.org/protobuf v1.32.0 // indirect
56+
golang.org/x/text v0.14.0 // indirect
57+
golang.org/x/time v0.5.0 // indirect
58+
google.golang.org/appengine v1.6.8 // indirect
59+
google.golang.org/genproto v0.0.0-20240213162025-012b6fc9bca9 // indirect
60+
google.golang.org/genproto/googleapis/api v0.0.0-20240314234333-6e1732d8331c // indirect
61+
google.golang.org/genproto/googleapis/rpc v0.0.0-20240311132316-a219d84964c2 // indirect
62+
google.golang.org/grpc v1.62.1 // indirect
63+
google.golang.org/protobuf v1.33.0 // indirect
3264
)

0 commit comments

Comments
 (0)