Skip to content

Commit a95d98b

Browse files
committed
Add Google Cloud Storage (GCS) backend support
- Implement GCS backend with full Backend interface support - Add GCS configuration flags and environment variables - Include integration tests for GCS backend - Document GCS usage, authentication, and Anywhere Cache setup - Update README with GCS performance guidance and best practices GCS backend provides a GCP-native alternative to S3, with optional Anywhere Cache support for improved read performance.
1 parent e8dd136 commit a95d98b

6 files changed

Lines changed: 792 additions & 21 deletions

File tree

README.md

Lines changed: 162 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
Effectively, `gobuildcache` leverages S3OZ as a distributed build cache for concurrent `go build` or `go test` processes regardless of whether they're running on a single machine or distributed across a fleet of CI VMs. This dramatically improves CI performance for large Go repositories because each CI process will behave as if running with an almost completely pre-populated build cache, even if the CI process was started on a completely ephemeral VM that has never compiled code or executed tests for the repository before.
2323

24-
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
24+
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region (and ideally same availability zone) as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris or Google Cloud Storage (GCS). For GCP users, enabling GCS Anywhere Cache can provide performance similar to S3OZ for read-heavy workloads. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
2525

2626
# Quick Start
2727

@@ -41,7 +41,9 @@ go test ./...
4141

4242
By default, `gobuildcache` uses an on-disk cache stored in the OS default temporary directory. This is useful for testing and experimentation with `gobuildcache`, but provides no benefits over the Go compiler's built-in cache, which also stores cached data locally on disk.
4343

44-
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, or a similarly low latency distributed backend.
44+
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, Google Cloud Storage, or a similarly low latency distributed backend.
45+
46+
### Using S3
4547

4648
```bash
4749
export BACKEND_TYPE=s3
@@ -61,7 +63,87 @@ go build ./...
6163
go test ./...
6264
```
6365

64-
Your credentials must have the following permissions:
66+
### Using Google Cloud Storage (GCS)
67+
68+
```bash
69+
export BACKEND_TYPE=gcs
70+
export GCS_BUCKET=$BUCKET_NAME
71+
```
72+
73+
GCS authentication uses Application Default Credentials. You can provide credentials in one of the following ways:
74+
75+
1. **Service Account JSON file** (recommended for CI):
76+
```bash
77+
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json
78+
export GOCACHEPROG=gobuildcache
79+
export BACKEND_TYPE=gcs
80+
export GCS_BUCKET=$BUCKET_NAME
81+
go build ./...
82+
go test ./...
83+
```
84+
85+
2. **Metadata service** (when running on GCP):
86+
```bash
87+
# No credentials file needed - uses metadata service automatically
88+
export GOCACHEPROG=gobuildcache
89+
export BACKEND_TYPE=gcs
90+
export GCS_BUCKET=$BUCKET_NAME
91+
go build ./...
92+
go test ./...
93+
```
94+
95+
3. **gcloud CLI credentials** (for local development):
96+
```bash
97+
gcloud auth application-default login
98+
export GOCACHEPROG=gobuildcache
99+
export BACKEND_TYPE=gcs
100+
export GCS_BUCKET=$BUCKET_NAME
101+
go build ./...
102+
go test ./...
103+
```
104+
105+
#### GCS Anywhere Cache (Recommended for Performance)
106+
107+
For improved performance, especially in read-heavy workloads, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache). Anywhere Cache provides an SSD-backed zonal read cache that can significantly reduce latency for frequently accessed cache objects.
108+
109+
**Benefits:**
110+
- **Lower read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
111+
- **Reduced costs**: Lower data transfer costs, especially for multi-region buckets, and reduced retrieval fees
112+
- **Better performance**: Especially beneficial when multiple CI jobs access the same cached artifacts
113+
- **Automatic scaling**: Cache capacity and bandwidth scale automatically based on usage
114+
115+
**Requirements:**
116+
- Bucket must be in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
117+
- CI runners should be in the same zone as the cache for optimal performance
118+
- Anywhere Cache is most effective for read-heavy workloads with high cache hit ratios
119+
120+
**Setup:**
121+
1. Verify your bucket region/zone supports Anywhere Cache
122+
2. Enable Anywhere Cache on your GCS bucket
123+
3. Configure the cache in the same zone as your CI runners for best performance
124+
4. Set admission policy to "First miss" for faster warm-up (caches on first access)
125+
5. Configure TTL based on your needs (1 hour to 7 days, default 24 hours)
126+
127+
```bash
128+
# Enable Anywhere Cache using gcloud CLI
129+
# Replace ZONE_NAME with the zone where your CI runners are located
130+
gcloud storage buckets update gs://YOUR_BUCKET_NAME \
131+
--enable-anywhere-cache \
132+
--anywhere-cache-zone=ZONE_NAME \
133+
--anywhere-cache-admission-policy=FIRST_MISS \
134+
--anywhere-cache-ttl=7d
135+
```
136+
137+
**Note:**
138+
- Anywhere Cache only accelerates reads. Writes still go directly to the bucket, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance.
139+
- First-time access to an object will still hit the bucket (cache miss), but subsequent reads will be served from the cache.
140+
- For best results, ensure your CI runners and cache are in the same zone.
141+
142+
For more details, including availability by region, see the [GCS Anywhere Cache documentation](https://cloud.google.com/storage/docs/anywhere-cache).
143+
144+
#### AWS Credentials Permissions
145+
146+
Your AWS credentials must have the following permissions:
65147

66148
```json
67149
{
@@ -95,15 +177,36 @@ Your credentials must have the following permissions:
95177
}
96178
```
97179

180+
#### GCS Credentials Permissions
181+
182+
Your GCS service account must have the following IAM roles or permissions:
183+
184+
- `storage.objects.create` - to upload cache objects
185+
- `storage.objects.get` - to download cache objects
186+
- `storage.objects.delete` - to delete cache objects (for clearing)
187+
- `storage.objects.list` - to list objects (for clearing)
188+
189+
The simplest way is to grant the `Storage Object Admin` role to your service account:
190+
191+
```bash
192+
gcloud projects add-iam-policy-binding PROJECT_ID \
193+
--member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
194+
--role="roles/storage.objectAdmin"
195+
```
196+
197+
Or for more granular control, create a custom role with only the required permissions.
198+
98199
## Github Actions Example
99200

100201
See the `examples` directory for examples of how to use `gobuildcache` in a Github Actions workflow.
101202

102-
## S3 Lifecycle Policy
203+
## Lifecycle Policies
103204

104-
It's recommended to configure a lifecycle policy on your S3 bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
205+
It's recommended to configure a lifecycle policy on your storage bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
105206

106-
Here's a sample lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
207+
### S3 Lifecycle Policy
208+
209+
Here's a sample S3 lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
107210

108211
```json
109212
{
@@ -125,6 +228,28 @@ Here's a sample lifecycle policy that expires objects after 7 days and aborts in
125228
}
126229
```
127230

231+
### GCS Lifecycle Policy
232+
233+
For GCS, you can configure a lifecycle policy using `gsutil` or the GCP Console. Here's an example using `gsutil` that expires objects after 7 days:
234+
235+
```bash
236+
gsutil lifecycle set - <<EOF
237+
{
238+
"lifecycle": {
239+
"rule": [
240+
{
241+
"action": {"type": "Delete"},
242+
"condition": {"age": 7}
243+
}
244+
]
245+
}
246+
}
247+
EOF
248+
gsutil lifecycle set - gs://YOUR_BUCKET_NAME
249+
```
250+
251+
Or using the GCP Console, navigate to your bucket → Lifecycle → Add a rule → Set condition to "Age" of 7 days → Action to "Delete".
252+
128253
# Preventing Cache Bloat
129254

130255
`gobuildcache` performs zero automatic GC or trimming of the local filesystem cache or the remote cache backend. Therefore, it is recommended that you run your CI on VMs with ephemeral storage and do not persist storage between CI runs. In addition, you should ensure that your remote cache backend has a lifecycle policy configured like the one described in the previous section.
@@ -139,20 +264,22 @@ gobuildcache clear-local
139264
gobuildcache clear-remote
140265
```
141266

142-
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable to the `clear-remote` command.
267+
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable (or `gcs-bucket`/`GCS_BUCKET` for GCS) to the `clear-remote` command.
143268

144269
# Configuration
145270

146271
`gobuildcache` ships with reasonable defaults, but this section provides a complete overview of flags / environment variables that can be used to override behavior.
147272

148273
| Flag | Environment Variable | Default | Description |
149274
|------|---------------------|---------|-------------|
150-
| `-backend` | `BACKEND_TYPE` | `disk` | Backend type: `disk` or `s3` |
275+
| `-backend` | `BACKEND_TYPE` | `disk` | Backend type: `disk`, `s3`, or `gcs` |
151276
| `-lock-type` | `LOCK_TYPE` | `fslock` | Mechanism for locking: `fslock` (filesystem) or `memory` |
152277
| `-cache-dir` | `CACHE_DIR` | `/$OS_TMP/gobuildcache/cache` | Local cache directory |
153278
| `-lock-dir` | `LOCK_DIR` | `/$OS_TMP/gobuildcache/locks` | Local directory for storing filesystem locks |
154279
| `-s3-bucket` | `S3_BUCKET` | (none) | S3 bucket name (required for S3) |
155280
| `-s3-prefix` | `S3_PREFIX` | (empty) | S3 key prefix |
281+
| `-gcs-bucket` | `GCS_BUCKET` | (none) | GCS bucket name (required for GCS) |
282+
| `-gcs-prefix` | `GCS_PREFIX` | (empty) | GCS object prefix |
156283
| `-debug` | `DEBUG` | `false` | Enable debug logging |
157284
| `-stats` | `PRINT_STATS` | `false` | Print cache statistics on exit |
158285

@@ -170,6 +297,7 @@ graph TB
170297
GBC -->|2. reads/writes| LFS[Local Filesystem Cache]
171298
GBC -->|3. GET/PUT| Backend{Backend Type}
172299
Backend --> S3OZ[S3 Express One Zone]
300+
Backend --> GCS[Google Cloud Storage]
173301
```
174302

175303
## Processing `GET` commands
@@ -275,4 +403,29 @@ Yes, but the latency of regular S3 is 10-20x higher than S3OZ, which undermines
275403

276404
## Do I have to use `gobuildcache` with self-hosted runners in AWS and S3OZ?
277405

278-
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, or Tigris as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend using self-hosted CI running in AWS and targeting a S3OZ bucket in the same region as your CI runners.
406+
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, Tigris, or Google Cloud Storage as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend:
407+
408+
- **AWS**: Self-hosted CI running in AWS targeting a S3OZ bucket in the same region (and ideally same availability zone) as your CI runners
409+
- **GCP**: Self-hosted CI running in GCP targeting a GCS Regional Standard bucket in the same region as your CI runners. For even better performance, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) to get zonal read caching.
410+
411+
## Can I use Google Cloud Storage instead of S3?
412+
413+
Yes! `gobuildcache` supports Google Cloud Storage (GCS) as a backend. GCS is a good alternative to S3, especially if you're already using GCP infrastructure.
414+
415+
**Performance Considerations:**
416+
417+
- **Standard GCS**: While GCS doesn't have an exact equivalent to S3 Express One Zone's single-AZ storage, using GCS Regional Standard buckets in the same region as your compute provides good performance.
418+
419+
- **GCS with Anywhere Cache** (Recommended): For read-heavy workloads like build caches, enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) can significantly improve performance:
420+
- **Read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
421+
- **Cost savings**: Reduced data transfer costs and lower read operation costs
422+
- **Best for**: Workloads where the same cache objects are accessed multiple times (common in CI where multiple jobs may access the same artifacts)
423+
424+
Anywhere Cache is particularly effective when:
425+
- Your CI runners are in the same zone as the cache
426+
- You have high cache hit ratios (same objects accessed repeatedly)
427+
- Your bucket is in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
428+
429+
- **Write latency**: GCS write latency may be higher than S3OZ, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance significantly.
430+
431+
**Recommendation**: If you're using GCP and want performance closer to S3OZ, use GCS Regional Standard buckets with Anywhere Cache enabled in the same zone as your CI runners. This provides excellent read performance while maintaining better durability than single-AZ storage.

go.mod

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,21 @@ module github.com/richardartoul/gobuildcache
33
go 1.25
44

55
require (
6+
cloud.google.com/go/storage v1.40.0
67
github.com/DataDog/sketches-go v1.4.6
78
github.com/aws/aws-sdk-go-v2 v1.32.7
89
github.com/aws/aws-sdk-go-v2/config v1.28.7
910
github.com/aws/aws-sdk-go-v2/service/s3 v1.71.1
1011
github.com/gofrs/flock v0.13.0
1112
github.com/pierrec/lz4/v4 v4.1.23
13+
google.golang.org/api v0.170.0
1214
)
1315

1416
require (
17+
cloud.google.com/go v0.112.1 // indirect
18+
cloud.google.com/go/compute v1.24.0 // indirect
19+
cloud.google.com/go/compute/metadata v0.2.3 // indirect
20+
cloud.google.com/go/iam v1.1.7 // indirect
1521
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.7 // indirect
1622
github.com/aws/aws-sdk-go-v2/credentials v1.17.48 // indirect
1723
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.22 // indirect
@@ -27,6 +33,32 @@ require (
2733
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.7 // indirect
2834
github.com/aws/aws-sdk-go-v2/service/sts v1.33.3 // indirect
2935
github.com/aws/smithy-go v1.22.1 // indirect
36+
github.com/felixge/httpsnoop v1.0.4 // indirect
37+
github.com/go-logr/logr v1.4.1 // indirect
38+
github.com/go-logr/stdr v1.2.2 // indirect
39+
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
40+
github.com/golang/protobuf v1.5.4 // indirect
41+
github.com/google/s2a-go v0.1.7 // indirect
42+
github.com/google/uuid v1.6.0 // indirect
43+
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
44+
github.com/googleapis/gax-go/v2 v2.12.3 // indirect
45+
go.opencensus.io v0.24.0 // indirect
46+
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.49.0 // indirect
47+
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
48+
go.opentelemetry.io/otel v1.24.0 // indirect
49+
go.opentelemetry.io/otel/metric v1.24.0 // indirect
50+
go.opentelemetry.io/otel/trace v1.24.0 // indirect
51+
golang.org/x/crypto v0.21.0 // indirect
52+
golang.org/x/net v0.22.0 // indirect
53+
golang.org/x/oauth2 v0.18.0 // indirect
54+
golang.org/x/sync v0.6.0 // indirect
3055
golang.org/x/sys v0.37.0 // indirect
31-
google.golang.org/protobuf v1.32.0 // indirect
56+
golang.org/x/text v0.14.0 // indirect
57+
golang.org/x/time v0.5.0 // indirect
58+
google.golang.org/appengine v1.6.8 // indirect
59+
google.golang.org/genproto v0.0.0-20240213162025-012b6fc9bca9 // indirect
60+
google.golang.org/genproto/googleapis/api v0.0.0-20240314234333-6e1732d8331c // indirect
61+
google.golang.org/genproto/googleapis/rpc v0.0.0-20240311132316-a219d84964c2 // indirect
62+
google.golang.org/grpc v1.62.1 // indirect
63+
google.golang.org/protobuf v1.33.0 // indirect
3264
)

0 commit comments

Comments
 (0)