WebGrab is a small Go scraping library built on top of goquery.
It provides a simple struct-tag API for:
- scraping from URLs, readers, or existing
goquerydocuments - extracting text, HTML, or attributes
- converting values into strings, numbers, booleans, times, slices, nested structs, or custom types
- handling optional fields, fallback selectors, URL resolution, hooks, retries, and strict/permissive list parsing
For the full guide, advanced features, and detailed examples, see docs/WIKI.md.
go get github.com/aljoni/webgrabpackage main
import (
"fmt"
"github.com/aljoni/webgrab"
)
type Page struct {
Title string `grab:"h1||title"`
Keywords []string `grab:"meta[name=keywords]" attr:"content" extract:"[^,]+"`
Author string `grab:".author" optional:"true" default:"Unknown"`
}
func main() {
grabber := webgrab.New()
var page Page
if err := grabber.Grab("https://example.com", &page); err != nil {
panic(err)
}
fmt.Println(page.Title)
fmt.Println(page.Keywords)
fmt.Println(page.Author)
}grab:"selector": CSS selector to scrape. Use||for fallbacks.attr:"href": read an attribute instead of text.extract:"regexp": keep the first capture group.filter:"regexp": keep only matching values.context:"selector": scope nested structs or repeated items.optional:"true": leave missing values at zero value.default:"value": fallback for optional fields.resolve:"url": resolve relative links against the page URL.layout:"...": time parsing layout fortime.Time.sep:"...": join multiple matches into one scalar field.
More tags and behavior notes are covered in docs/WIKI.md.
Built-in support includes:
stringboolint,uint,float64, and the other standard integer/float variantstime.Time- slices of the supported scalar types
- nested structs and slices of structs
You can also register custom converters per Grabber. See the custom type converters section in the wiki.
Grab(url, &dst): fetch a page and scrape itGrabReader(baseURL, reader, &dst): scrape existing HTMLGrabDocument(baseURL, doc, &dst): scrape an existinggoquery.Document
Grabber also supports:
- custom
HTTPClient BeforeRequestandAfterResponsehooks- allowed non-200 status codes
- retry configuration
- optional
robots.txtenforcement forGrab StrictModefor slices of structs
See the wiki sections on HTTP behavior, hooks, and retries and strict mode for details.
WebGrab exposes typed errors:
FieldErrorStatusErrorRequestError
See the errors section in the wiki for examples with errors.As.