Hugging Face offers an easy and unified access to serverless AI inference through multiple inference providers, like Together AI, Cerebras and Fireworks AI. More details in the Inference Providers documentation.
You can check available models for an inference provider by going to huggingface.co/models, clicking the "Other" filter tab, and selecting your desired provider. For example, you can find all Fireworks AI supported models here.
Important Note: The provider is set to "auto" in NLWeb, which will select the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
Billing is centralized on your Hugging Face account, no matter which providers you are using. You are billed the standard provider API rates with no additional markup - Hugging Face simply passes through the provider costs. Note that Hugging Face PRO users get $2 worth of Inference credits every month that can be used across providers.
With a single Hugging Face token, you can access inference through multiple providers. Your calls are routed through Hugging Face and the usage is billed directly to your Hugging Face account at the standard provider API rates.
Simply set the HF_TOKEN environment variable with your Hugging Face token, you can create one here: https://huggingface.co/settings/tokens.