Definition
Model monitoring is the continuous process of observing a deployed model's behavior and performance in production. A model that scored well in testing can quietly decay once it faces real, shifting data, so monitoring exists to catch that degradation before it costs anything — by tracking predefined metrics and alerting when they cross a threshold.
What gets monitored
Monitoring spans three broad categories. Performance metrics like accuracy, precision, and recall measure prediction quality (when ground truth is available). Operational metrics such as inference latency, throughput, and GPU memory utilization measure system health. Data quality metrics check that incoming input still matches the expected schema and statistical distribution — a spike in missing values often signals a broken upstream pipeline rather than a model fault.
Monitoring versus observability
Monitoring is typically a subset of observability. Monitoring tells you that a metric crossed a threshold; observability helps you understand why — whether an accuracy drop came from a data shift, a new user segment, or a pipeline error. One of the most important things monitoring surfaces is drift, the slow change in input or relationships that erodes accuracy over time.
For a self-hosted deployment, monitoring is non-negotiable: you are the operator, so you own the alerts and the response. It closes the loop in MLOps, turning a static deployment into a system you can trust over time.
In Simple Terms
Model monitoring is the continuous process of observing a deployed model’s behavior and performance in production. A model that scored well in testing can quietly…
