Supported Model Providers ¶

AI Hypervisor inspects outbound calls from instrumented workloads to every major commercial model provider and to self-hosted inference endpoints. Provider detection is automatic; no per-provider configuration is required for discovery or observation.

This page lists the level of inspection AI Hypervisor performs for each supported provider, plus the configuration knobs that affect a given provider's traffic.

Provider support matrix¶

Provider	Discovery	Per-call inspection	PII detection	Spend tracking
Anthropic	✓	✓	✓	—
OpenAI	✓	✓	✓	—
AWS Bedrock	✓	✓	✓	✓
Azure OpenAI	✓	✓	✓	—
Google Gemini (incl. Vertex)	✓	✓	✓	—
Cohere	✓	✓	✓	—
Mistral	✓	✓	✓	—
Together AI	✓	✓	✓	—
Groq	✓	✓	✓	—
Hugging Face Inference	✓	✓	✓	—
Replicate	✓	✓	✓	—
Fireworks AI	✓	✓	✓	—
Perplexity	✓	✓	✓	—
DeepSeek	✓	✓	✓	—
Self-hosted (Ollama, vLLM, llama.cpp, TGI)	✓	✓	✓	—
Other (any unrecognized endpoint)	✓	Limited: request body recorded, response parsing best-effort	✓ on detected fields	—

Columns:

Discovery. AI Hypervisor recognizes the provider by request signature (host, path, headers, content shape) and classifies the asset in Registry.
Per-call inspection. Full parsing of prompt, response, model name, and token counts. Available for providers whose API schema is built in. Other providers fall back to "limited" mode where the request body is captured as-is.
PII detection. Scans prompt and response fields for the platform's PII patterns, regardless of parsing depth.
Spend tracking. Per-call cost attribution. Currently AWS Bedrock only; Bedrock exposes per-invocation cost in response metadata.

Provider-specific notes¶

Anthropic¶

Detected on outbound calls to api.anthropic.com (and customer-specific gateway hosts where the anthropic-version header is set). Full parsing of the Messages API, including streaming responses. Tool-use blocks are extracted and appear in User Tracks as tool_intent and tool_call steps.

OpenAI¶

Detected on outbound calls to api.openai.com. Full parsing of Chat Completions, Responses API, and Assistants API. Function-calling responses are extracted into tool-use steps in User Tracks. Embedding calls are detected and counted but do not generate per-step entries.

AWS Bedrock¶

Detected on outbound calls to bedrock-runtime.<region>.amazonaws.com. The full set of Bedrock-hosted models is supported: Anthropic Claude variants, Amazon Titan, Cohere Command, AI21 Jamba, Meta Llama, and Mistral models. Per-invocation cost is recorded and attributed to the originating user or session.

AI Hypervisor has two independent relationships with Bedrock:

Inspecting customer Bedrock traffic. When your agents call Bedrock, AI Hypervisor observes the call, parses the request and response, and tracks token usage and cost per call. This is the row above.
Internal Bedrock for AI-powered recommendations. The per-tenant backend optionally uses Bedrock-hosted Claude to generate remediation suggestions for findings. The call is made from your AWS account using EKS Pod Identity. Only finding metadata is sent; no finding payload leaves your cluster. Configurable in Settings.

You can use AI Hypervisor without ever calling Bedrock yourself, and you can disable internal Bedrock use without affecting customer-traffic inspection.

Azure OpenAI¶

Detected on outbound calls to *.openai.azure.com. The deployment name in the URL path is mapped to the underlying model when the platform can resolve it; otherwise the deployment name appears as the model identifier.

Google Gemini¶

Detected on outbound calls to generativelanguage.googleapis.com and Vertex AI endpoints (*-aiplatform.googleapis.com). Both the public Gemini API and Vertex-hosted Gemini deployments are supported.

Cohere¶

Detected on outbound calls to api.cohere.com and api.cohere.ai. The Chat, Generate, and Embed APIs are parsed.

Mistral¶

Detected on outbound calls to api.mistral.ai. The Chat Completions API and the Embeddings API are parsed.

Together AI¶

Detected on outbound calls to api.together.xyz and api.together.ai. The OpenAI-compatible Chat Completions API and Together's native model-endpoint API are parsed.

Other built-in providers¶

These providers have built-in detection signatures, so per-call inspection is full (not best-effort): Groq (api.groq.com), Hugging Face Inference (api-inference.huggingface.co), Replicate (api.replicate.com), Fireworks AI (api.fireworks.ai), Perplexity (api.perplexity.ai), DeepSeek (api.deepseek.com).

Self-hosted endpoints¶

Self-hosted inference endpoints are detected when the request and response schemas match a known framework: Ollama (default port 11434), vLLM in OpenAI-compatible mode (8000), llama.cpp (8080), and TGI's /generate endpoint. AI Hypervisor tags self-hosted models as such in Registry, so you can scope queries and reports to external providers only without including local inference.

Other providers¶

Providers not in the built-in list are still discovered and observed at the network and request-body level. Per-call parsing is best-effort: AI Hypervisor extracts prompt and response from common schemas, and falls back to recording the raw request body for PII detection and rule matching. To add full parsing for a provider not on the list, contact Wallarm.

How detection works¶

Provider detection is a runtime classification, not a configuration list. The HIGGS Scanner observes each outbound call on the instrumented pod and runs a fingerprinting pass:

The destination host is matched against a list of known provider domains.
The request signature (path, headers, content type, body schema) is matched against the provider's API patterns.
The result is recorded as asset class LLM with a provider subtype in Registry.

When a new provider domain appears that AI Hypervisor does not recognize, it is classified as unsanctioned shadow AI and surfaces in Findings under the Shadow risk column. Promote it to tolerated or sanctioned from Registry once you have decided on its governance state.