Osuite Cloud

Alerts

Last updated on March 20, 2026

Overview

Alerts in Osuite let you define conditions on your metrics and get notified the moment something crosses a threshold.

Creating an alert

Navigate to Alerts → New Alert in the Osuite UI.

Step 1: Define PromQL queries and expression

Write a PromQL queries and expression.

http_server_request_duration_seconds{service="payment-service"}
A > 10

Step 2: Define severity & Evaluation interval

Assign a severity to the alert so on-call engineers know how urgently to respond:

  • Critical — Production is down or severely degraded
  • Warning — Something is off, but not yet user-impacting
  • Info — Informational threshold for awareness

The duration avoids false positives from momentary spikes. Osuite only fires when the condition is continuously true for the full window.

Save the alert.

Step 3: Configure notifications

You can configure notication channels in “Settings > General > Notifications”.

ChannelDescription
SlackSend a message to the configured Slack channel.
OpsgenieCreate a Opsgenie incident.

Managing alerts

Alert list

All your alerts are visible at Alerts → All Alerts with their current state:

StateMeaning
NormalCondition is not met — system is healthy
FiringCondition has been met
AlertingCondition is continuously true for the full window

Muting alerts

You can mute any alert for a defined window — useful during planned maintenance, deployments, or load tests where you expect elevated error rates. Navigate to the alert and click Pause.


Best practices

  • `Start with the error rate, then tune. The most valuable first alert for any service is an error rate threshold. Start at 1% and observe for a few days before tightening.

  • Use the pending duration to avoid alert fatigue. A momentary error spike at 2% for 30 seconds is often not worth waking someone up. Requiring 5 consecutive minutes filters the noise without losing real incidents.

  • Correlate alert thresholds with your SLOs. If your SLO is 99.9% success rate, set a warning at 0.5% error rate and a critical at 1%. This gives you time to respond before breaching the SLO.

  • Name alerts clearly. Use names that describe the impact: "High error rate – payment-service" is more useful than "payment_service_errors_alert" when you’re being paged at 3am.