Interface does not load consistently
Incident Report for Merinio
Postmortem

Background:
On the afternoon of February 19th, 2021, our main API servers began exhibiting significantly elevated error rates coupled with extreme latency. This issue caused widespread unresponsiveness across our infrastructure for about half an hour, severely impacting user experience by preventing app loading and creating substantial delays in page transitions.

Root Cause Analysis:
The initial remedy involved expanding the capacity of our production cluster and rebooting several key services. A thorough investigation of the logs revealed that the request load on our systems had surged by approximately 3000%. This dramatic increase led to escalating latency, culminating in widespread request timeouts. The surge was traced back to a recent modification in the bulk edit tool within Merinio 2.0. This change inadvertently caused the entire user list to refresh on all logged-in devices for each user edit, as opposed to the previous setup where the list refreshed only upon page changes post-resource modification.

Immediate Response:
To mitigate the immediate impact, we have temporarily disabled real-time updates for changes executed through the bulk edit tool. Our team is actively developing a more optimized solution to handle such scenarios efficiently.

Impact:
This incident significantly disrupted operations for our users, leading to a degraded experience. We deeply regret the inconvenience and interruption caused to our users during this period.

Posted Feb 23, 2021 - 17:47 EST

Resolved
This incident has been resolved, we will continue to monitor the situation closely.
Posted Feb 19, 2021 - 18:22 EST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 19, 2021 - 15:36 EST
Investigating
We are currently investigating an issue where Merinio responds intermittently and sometimes does not load.
Posted Feb 19, 2021 - 15:11 EST
This incident affected: Web Application.