You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Implement GCS backend with full Backend interface support
- Add GCS configuration flags and environment variables
- Include integration tests for GCS backend
- Document GCS usage, authentication, and Anywhere Cache setup
- Update README with GCS performance guidance and best practices
GCS backend provides a GCP-native alternative to S3, with optional
Anywhere Cache support for improved read performance.
Copy file name to clipboardExpand all lines: README.md
+162-9Lines changed: 162 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@
21
21
22
22
Effectively, `gobuildcache` leverages S3OZ as a distributed build cache for concurrent `go build` or `go test` processes regardless of whether they're running on a single machine or distributed across a fleet of CI VMs. This dramatically improves CI performance for large Go repositories because each CI process will behave as if running with an almost completely pre-populated build cache, even if the CI process was started on a completely ephemeral VM that has never compiled code or executed tests for the repository before.
23
23
24
-
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
24
+
`gobuildcache` is highly sensitive to the latency of the remote storage backend, so it works best when running on self-hosted runners in AWS targeting an S3 Express One Zone bucket in the same region (and ideally same availability zone) as the self-hosted runners. That said, it doesn't have to be used that way. For example, if you're using Github's hosted runners or self-hosted runners outside of AWS, you can use a different storage solution like Tigris or Google Cloud Storage (GCS). For GCP users, enabling GCS Anywhere Cache can provide performance similar to S3OZ for read-heavy workloads. See `examples/github_actions_tigris.yml` for an example of using `gobuildcache` with Tigris.
25
25
26
26
# Quick Start
27
27
@@ -41,7 +41,9 @@ go test ./...
41
41
42
42
By default, `gobuildcache` uses an on-disk cache stored in the OS default temporary directory. This is useful for testing and experimentation with `gobuildcache`, but provides no benefits over the Go compiler's built-in cache, which also stores cached data locally on disk.
43
43
44
-
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, or a similarly low latency distributed backend.
44
+
For "production" use-cases in CI, you'll want to configure `gobuildcache` to use S3 Express One Zone, Google Cloud Storage, or a similarly low latency distributed backend.
45
+
46
+
### Using S3
45
47
46
48
```bash
47
49
export BACKEND_TYPE=s3
@@ -61,7 +63,87 @@ go build ./...
61
63
go test ./...
62
64
```
63
65
64
-
Your credentials must have the following permissions:
66
+
### Using Google Cloud Storage (GCS)
67
+
68
+
```bash
69
+
export BACKEND_TYPE=gcs
70
+
export GCS_BUCKET=$BUCKET_NAME
71
+
```
72
+
73
+
GCS authentication uses Application Default Credentials. You can provide credentials in one of the following ways:
74
+
75
+
1.**Service Account JSON file** (recommended for CI):
# No credentials file needed - uses metadata service automatically
88
+
export GOCACHEPROG=gobuildcache
89
+
export BACKEND_TYPE=gcs
90
+
export GCS_BUCKET=$BUCKET_NAME
91
+
go build ./...
92
+
go test ./...
93
+
```
94
+
95
+
3.**gcloud CLI credentials** (for local development):
96
+
```bash
97
+
gcloud auth application-default login
98
+
export GOCACHEPROG=gobuildcache
99
+
export BACKEND_TYPE=gcs
100
+
export GCS_BUCKET=$BUCKET_NAME
101
+
go build ./...
102
+
go test ./...
103
+
```
104
+
105
+
#### GCS Anywhere Cache (Recommended for Performance)
106
+
107
+
For improved performance, especially in read-heavy workloads, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache). Anywhere Cache provides an SSD-backed zonal read cache that can significantly reduce latency for frequently accessed cache objects.
108
+
109
+
**Benefits:**
110
+
-**Lower read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
111
+
-**Reduced costs**: Lower data transfer costs, especially for multi-region buckets, and reduced retrieval fees
112
+
-**Better performance**: Especially beneficial when multiple CI jobs access the same cached artifacts
113
+
-**Automatic scaling**: Cache capacity and bandwidth scale automatically based on usage
114
+
115
+
**Requirements:**
116
+
- Bucket must be in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
117
+
- CI runners should be in the same zone as the cache for optimal performance
118
+
- Anywhere Cache is most effective for read-heavy workloads with high cache hit ratios
119
+
120
+
**Setup:**
121
+
1. Verify your bucket region/zone supports Anywhere Cache
122
+
2. Enable Anywhere Cache on your GCS bucket
123
+
3. Configure the cache in the same zone as your CI runners for best performance
124
+
4. Set admission policy to "First miss" for faster warm-up (caches on first access)
125
+
5. Configure TTL based on your needs (1 hour to 7 days, default 24 hours)
126
+
127
+
```bash
128
+
# Enable Anywhere Cache using gcloud CLI
129
+
# Replace ZONE_NAME with the zone where your CI runners are located
- Anywhere Cache only accelerates reads. Writes still go directly to the bucket, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance.
139
+
- First-time access to an object will still hit the bucket (cache miss), but subsequent reads will be served from the cache.
140
+
- For best results, ensure your CI runners and cache are in the same zone.
141
+
142
+
For more details, including availability by region, see the [GCS Anywhere Cache documentation](https://cloud.google.com/storage/docs/anywhere-cache).
143
+
144
+
#### AWS Credentials Permissions
145
+
146
+
Your AWS credentials must have the following permissions:
65
147
66
148
```json
67
149
{
@@ -95,15 +177,36 @@ Your credentials must have the following permissions:
95
177
}
96
178
```
97
179
180
+
#### GCS Credentials Permissions
181
+
182
+
Your GCS service account must have the following IAM roles or permissions:
183
+
184
+
-`storage.objects.create` - to upload cache objects
185
+
-`storage.objects.get` - to download cache objects
186
+
-`storage.objects.delete` - to delete cache objects (for clearing)
187
+
-`storage.objects.list` - to list objects (for clearing)
188
+
189
+
The simplest way is to grant the `Storage Object Admin` role to your service account:
Or for more granular control, create a custom role with only the required permissions.
198
+
98
199
## Github Actions Example
99
200
100
201
See the `examples` directory for examples of how to use `gobuildcache` in a Github Actions workflow.
101
202
102
-
## S3 Lifecycle Policy
203
+
## Lifecycle Policies
103
204
104
-
It's recommended to configure a lifecycle policy on your S3 bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
205
+
It's recommended to configure a lifecycle policy on your storage bucket to automatically expire old cache entries and control storage costs. Build cache data is typically only useful for a limited time (e.g., a few days to a week), after which it's likely stale.
105
206
106
-
Here's a sample lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
207
+
### S3 Lifecycle Policy
208
+
209
+
Here's a sample S3 lifecycle policy that expires objects after 7 days and aborts incomplete multipart uploads after 24 hours:
107
210
108
211
```json
109
212
{
@@ -125,6 +228,28 @@ Here's a sample lifecycle policy that expires objects after 7 days and aborts in
125
228
}
126
229
```
127
230
231
+
### GCS Lifecycle Policy
232
+
233
+
For GCS, you can configure a lifecycle policy using `gsutil` or the GCP Console. Here's an example using `gsutil` that expires objects after 7 days:
234
+
235
+
```bash
236
+
gsutil lifecycle set - <<EOF
237
+
{
238
+
"lifecycle": {
239
+
"rule": [
240
+
{
241
+
"action": {"type": "Delete"},
242
+
"condition": {"age": 7}
243
+
}
244
+
]
245
+
}
246
+
}
247
+
EOF
248
+
gsutil lifecycle set - gs://YOUR_BUCKET_NAME
249
+
```
250
+
251
+
Or using the GCP Console, navigate to your bucket → Lifecycle → Add a rule → Set condition to "Age" of 7 days → Action to "Delete".
252
+
128
253
# Preventing Cache Bloat
129
254
130
255
`gobuildcache` performs zero automatic GC or trimming of the local filesystem cache or the remote cache backend. Therefore, it is recommended that you run your CI on VMs with ephemeral storage and do not persist storage between CI runs. In addition, you should ensure that your remote cache backend has a lifecycle policy configured like the one described in the previous section.
@@ -139,20 +264,22 @@ gobuildcache clear-local
139
264
gobuildcache clear-remote
140
265
```
141
266
142
-
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable to the `clear-remote` command.
267
+
The clear commands take the same flags / environment variables as the regular `gobuildcache` tool, so for example you can provide the `cache-dir` flag or `CACHE_DIR` environment variable to the `clear-local` command and the `s3-bucket` flag or `S3_BUCKET` environment variable (or `gcs-bucket`/`GCS_BUCKET` for GCS) to the `clear-remote` command.
143
268
144
269
# Configuration
145
270
146
271
`gobuildcache` ships with reasonable defaults, but this section provides a complete overview of flags / environment variables that can be used to override behavior.
147
272
148
273
| Flag | Environment Variable | Default | Description |
@@ -275,4 +403,29 @@ Yes, but the latency of regular S3 is 10-20x higher than S3OZ, which undermines
275
403
276
404
## Do I have to use `gobuildcache` with self-hosted runners in AWS and S3OZ?
277
405
278
-
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, or Tigris as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend using self-hosted CI running in AWS and targeting a S3OZ bucket in the same region as your CI runners.
406
+
No, you can use `gobuildcache` any way you want as long as the `gobuildcache` binary can reach the remote storage backend. For example, you could run it on your laptop and use regular S3, R2, Tigris, or Google Cloud Storage as the remote object storage solution. However, `gobuildcache` works best when the latency of remote backend operations (`GET` and `PUT`) is low, so for best performance we recommend:
407
+
408
+
-**AWS**: Self-hosted CI running in AWS targeting a S3OZ bucket in the same region (and ideally same availability zone) as your CI runners
409
+
-**GCP**: Self-hosted CI running in GCP targeting a GCS Regional Standard bucket in the same region as your CI runners. For even better performance, consider enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) to get zonal read caching.
410
+
411
+
## Can I use Google Cloud Storage instead of S3?
412
+
413
+
Yes! `gobuildcache` supports Google Cloud Storage (GCS) as a backend. GCS is a good alternative to S3, especially if you're already using GCP infrastructure.
414
+
415
+
**Performance Considerations:**
416
+
417
+
-**Standard GCS**: While GCS doesn't have an exact equivalent to S3 Express One Zone's single-AZ storage, using GCS Regional Standard buckets in the same region as your compute provides good performance.
418
+
419
+
-**GCS with Anywhere Cache** (Recommended): For read-heavy workloads like build caches, enabling [GCS Anywhere Cache](https://cloud.google.com/storage/docs/anywhere-cache) can significantly improve performance:
420
+
-**Read latency**: Cached reads from the same zone can achieve single-digit millisecond latency, comparable to S3OZ for repeated access
421
+
-**Cost savings**: Reduced data transfer costs and lower read operation costs
422
+
-**Best for**: Workloads where the same cache objects are accessed multiple times (common in CI where multiple jobs may access the same artifacts)
423
+
424
+
Anywhere Cache is particularly effective when:
425
+
- Your CI runners are in the same zone as the cache
426
+
- You have high cache hit ratios (same objects accessed repeatedly)
427
+
- Your bucket is in a [supported region/zone](https://cloud.google.com/storage/docs/anywhere-cache#availability)
428
+
429
+
-**Write latency**: GCS write latency may be higher than S3OZ, but since `gobuildcache` performs writes asynchronously, this typically doesn't impact build performance significantly.
430
+
431
+
**Recommendation**: If you're using GCP and want performance closer to S3OZ, use GCS Regional Standard buckets with Anywhere Cache enabled in the same zone as your CI runners. This provides excellent read performance while maintaining better durability than single-AZ storage.
0 commit comments