You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Implement GCS backend with full Backend interface support
- Add GCS configuration flags and environment variables
- Include integration tests for GCS backend
- Document GCS usage, authentication, and Anywhere Cache setup
- Update README with GCS performance guidance and best practices
GCS backend provides a GCP-native alternative to S3, with optional
Anywhere Cache support for improved read performance.
Copy file name to clipboardExpand all lines: README.md
+161-8Lines changed: 161 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@
21
21
22
22
Effectively, `gobuildcache` leverages S3OZ as a distributed build cache for concurrent `go build` or `go test` processes regardless of whether they're running on a single machine or distributed across a fleet of CI VMs. This dramatically improves CI performance for large Go repositories because each CI process will behave as if running with an almost completely pre-populated build cache, even if the CI process was started on a completely ephemeral VM that has never compiled code or executed tests for the repository before.
23
23
24
-
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
24
+
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region (and ideally same availability zone) as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris or Google Cloud Storage (GCS). For GCP users, enabling GCS Anywhere Cache can provide performance similar to S3OZ for read-heavy workloads. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
25
25
26
26
# Quick Start
27
27
@@ -41,7 +41,9 @@ go test ./...
41
41
42
42
By default, `gobuildcache` uses an on-disk cache stored in the OS default temporary directory. This is useful for testing and experimentation with `gobuildcache`, but provides no benefits over the Go compiler's built-in cache, which also stores cached data locally on disk.
43
43
44
-
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, or a similarly low latency distributed backend.
44
+
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, Google Cloud Storage, or a similarly low latency distributed backend.
45
+
46
+
### Using S3
45
47
46
48
```bash
47
49
export GOBUILDCACHE_BACKEND_TYPE=s3
@@ -63,6 +65,86 @@ go test ./...
63
65
64
66
> **Note**: All configuration environment variables support both `GOBUILDCACHE_<KEY>` and `<KEY>` forms (e.g., both `GOBUILDCACHE_S3_BUCKET` and `S3_BUCKET` work). The prefixed version takes precedence if both are set. The prefixed form is recommended to avoid conflicts with other tools. If the prefixed variable is set to an empty string, it falls through to the unprefixed version (or default).
65
67
68
+
### Using Google Cloud Storage (GCS)
69
+
70
+
```bash
71
+
export GOBUILDCACHE_BACKEND_TYPE=gcs
72
+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
73
+
```
74
+
75
+
GCS authentication uses Application Default Credentials. You can provide credentials in one of the following ways:
76
+
77
+
1.**Service Account JSON file** (recommended for CI):
# No credentials file needed - uses metadata service automatically
90
+
export GOCACHEPROG=gobuildcache
91
+
export GOBUILDCACHE_BACKEND_TYPE=gcs
92
+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
93
+
go build ./...
94
+
go test ./...
95
+
```
96
+
97
+
3.**gcloud CLI credentials** (for local development):
98
+
```bash
99
+
gcloud auth application-default login
100
+
export GOCACHEPROG=gobuildcache
101
+
export GOBUILDCACHE_BACKEND_TYPE=gcs
102
+
export GOBUILDCACHE_GCS_BUCKET=$BUCKET_NAME
103
+
go build ./...
104
+
go test ./...
105
+
```
106
+
107
+
#### GCS Anywhere Cache (Recommended for Performance)
108
+
109
+
For improved performance, especially in read-heavy workloads, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache). Anywhere Cache provides an SSD-backed zonal read cache that can significantly reduce latency for frequently accessed cache objects.
110
+
111
+
**Benefits:**
112
+
-**Lower read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
113
+
-**Reduced costs**: Lower data transfer costs, especially for multi-region buckets, and reduced retrieval fees
114
+
-**Better performance**: Especially beneficial when multiple CI jobs access the same cached artifacts
115
+
-**Automatic scaling**: Cache capacity and bandwidth scale automatically based on usage
116
+
117
+
**Requirements:**
118
+
- Bucket must be in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
119
+
- CI runners should be in the same zone as the cache for optimal performance
120
+
- Anywhere Cache is most effective for read-heavy workloads with high cache hit ratios
121
+
122
+
**Setup:**
123
+
1. Verify your bucket region/zone supports Anywhere Cache
124
+
2. Enable Anywhere Cache on your GCS bucket
125
+
3. Configure the cache in the same zone as your CI runners for best performance
126
+
4. Set admission policy to "First miss" for faster warm-up (caches on first access)
127
+
5. Configure TTL based on your needs (1 hour to 7 days, default 24 hours)
128
+
129
+
```bash
130
+
# Enable Anywhere Cache using gcloud CLI
131
+
# Replace ZONE_NAME with the zone where your CI runners are located
- Anywhere Cache only accelerates reads. Writes still go directly to the bucket, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance.
141
+
- First-time access to an object will still hit the bucket (cache miss), but subsequent reads will be served from the cache.
142
+
- For best results, ensure your CI runners and cache are in the same zone.
143
+
144
+
For more details, including availability by region, see the [GCS Anywhere Cache documentation](https://cloud.google.com/storage/docs/anywhere-cache).
145
+
146
+
#### AWS Credentials Permissions
147
+
66
148
Your credentials must have the following permissions:
67
149
68
150
```json
@@ -97,15 +179,36 @@ Your credentials must have the following permissions:
97
179
}
98
180
```
99
181
182
+
#### GCS Credentials Permissions
183
+
184
+
Your GCS service account must have the following IAM roles or permissions:
185
+
186
+
-`storage.objects.create` - to upload cache objects
187
+
-`storage.objects.get` - to download cache objects
188
+
-`storage.objects.delete` - to delete cache objects (for clearing)
189
+
-`storage.objects.list` - to list objects (for clearing)
190
+
191
+
The simplest way is to grant the `Storage Object Admin` role to your service account:
Or for more granular control, create a custom role with only the required permissions.
200
+
100
201
## Github Actions Example
101
202
102
203
See the `examples` directory for examples of how to use `gobuildcache` in a Github Actions workflow.
103
204
104
-
## S3 Lifecycle Policy
205
+
## Lifecycle Policies
105
206
106
-
It's recommended to configure a lifecycle policy on your S3 bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
207
+
It's recommended to configure a lifecycle policy on your storage bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
107
208
108
-
Here's a sample lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
209
+
### S3 Lifecycle Policy
210
+
211
+
Here's a sample S3 lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
109
212
110
213
```json
111
214
{
@@ -127,6 +230,28 @@ Here's a sample lifecycle policy that expires objects after 7 days and aborts in
127
230
}
128
231
```
129
232
233
+
### GCS Lifecycle Policy
234
+
235
+
For GCS, you can configure a lifecycle policy using `gsutil` or the GCP Console. Here's an example using `gsutil` that expires objects after 7 days:
236
+
237
+
```bash
238
+
gsutil lifecycle set - <<EOF
239
+
{
240
+
"lifecycle": {
241
+
"rule": [
242
+
{
243
+
"action": {"type": "Delete"},
244
+
"condition": {"age": 7}
245
+
}
246
+
]
247
+
}
248
+
}
249
+
EOF
250
+
gsutil lifecycle set - gs://YOUR_BUCKET_NAME
251
+
```
252
+
253
+
Or using the GCP Console, navigate to your bucket → Lifecycle → Add a rule → Set condition to "Age" of 7 days → Action to "Delete".
254
+
130
255
# Preventing Cache Bloat
131
256
132
257
`gobuildcache` performs zero automatic GC or trimming of the local filesystem cache or the remote cache backend. Therefore, it is recommended that you run your CI on VMs with ephemeral storage and do not persist storage between CI runs. In addition, you should ensure that your remote cache backend has a lifecycle policy configured like the one described in the previous section.
@@ -141,7 +266,7 @@ gobuildcache clear-local
141
266
gobuildcache clear-remote
142
267
```
143
268
144
-
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable to the `clear-remote` command.
269
+
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable (or `gcs-bucket`/`GCS_BUCKET` for GCS) to the `clear-remote` command.
145
270
146
271
# Configuration
147
272
@@ -151,12 +276,14 @@ All environment variables support both `GOBUILDCACHE_<KEY>` and `<KEY>` forms (e
151
276
152
277
| Flag | Environment Variable | Default | Description |
@@ -280,4 +408,29 @@ Yes, but the latency of regular S3 is 10-20x higher than S3OZ, which undermines
280
408
281
409
## Do I have to use `gobuildcache` with self-hosted runners in AWS and S3OZ?
282
410
283
-
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, or Tigris as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend using self-hosted CI running in AWS and targeting a S3OZ bucket in the same region as your CI runners.
411
+
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, Tigris, or Google Cloud Storage as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend:
412
+
413
+
-**AWS**: Self-hosted CI running in AWS targeting a S3OZ bucket in the same region (and ideally same availability zone) as your CI runners
414
+
-**GCP**: Self-hosted CI running in GCP targeting a GCS Regional Standard bucket in the same region as your CI runners. For even better performance, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) to get zonal read caching.
415
+
416
+
## Can I use Google Cloud Storage instead of S3?
417
+
418
+
Yes! `gobuildcache` supports Google Cloud Storage (GCS) as a backend. GCS is a good alternative to S3, especially if you're already using GCP infrastructure.
419
+
420
+
**Performance Considerations:**
421
+
422
+
-**Standard GCS**: While GCS doesn't have an exact equivalent to S3 Express One Zone's single-AZ storage, using GCS Regional Standard buckets in the same region as your compute provides good performance.
423
+
424
+
-**GCS with Anywhere Cache** (Recommended): For read-heavy workloads like build caches, enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) can significantly improve performance:
425
+
-**Read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
426
+
-**Cost savings**: Reduced data transfer costs and lower read operation costs
427
+
-**Best for**: Workloads where the same cache objects are accessed multiple times (common in CI where multiple jobs may access the same artifacts)
428
+
429
+
Anywhere Cache is particularly effective when:
430
+
- Your CI runners are in the same zone as the cache
431
+
- You have high cache hit ratios (same objects accessed repeatedly)
432
+
- Your bucket is in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
433
+
434
+
-**Write latency**: GCS write latency may be higher than S3OZ, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance significantly.
435
+
436
+
**Recommendation**: If you're using GCP and want performance closer to S3OZ, use GCS Regional Standard buckets with Anywhere Cache enabled in the same zone as your CI runners. This provides excellent read performance while maintaining better durability than single-AZ storage.
0 commit comments