Purpose: The Model Context Protocol (MCP) Service acts as a secure adapter, exposing a standard set of internal tools (e.g., queryUsers) that the ai_service can consume via the MCP standard.
Criticality: High (Tier 2). If this service is down, the AI assistant cannot perform any actions.
Owners: Core Infrastructure Team (#infra-oncall on Slack).
Key Dependencies:
Internal: ai_service (the only client), auth_service, and other internal APIs that the tools connect to.
Meaning: A specific tool (e.g., manageUserGroup) is failing frequently.
Diagnosis:
Check the Grafana dashboard to identify which tool is failing.
Filter the logs for that tool: service:"mcp-service" AND tool_name:"manageUserGroup" AND outcome:"failure".
The log should contain the error returned from the downstream internal API (e.g., a 404 from auth_service because the user was not found).
Resolution:
The problem is almost always with the downstream service the tool is calling. Use the error message to identify the correct downstream runbook (e.g., runbook-auth-service.md) and begin troubleshooting there.