3 consecutive bad days is the default. Working in finance? Bump it to 5. Running a startup? 2 might be generous enough.
The default threshold of three consecutive bad days wasn't chosen arbitrarily. It reflects a specific philosophy about what constitutes a pattern versus noise:
One bad day is noise. Everyone has them. External pressure, poor sleep, a bad phone call before they walked in the door. A single incident says almost nothing about a person.
Two consecutive bad days is worth noting. Something is going on. But it's still not enough to act on — it could still be circumstantial, and one good day sits between you and a clean bill of health.
Three in a row is a choice. At this point, the behavior is no longer being overridden by their better nature on a given day. It's what you're getting. The question is whether it's acceptable.
The right threshold depends heavily on the relationship, the environment, and what the stakes are.
The flag threshold (when you start counting a day as bad) is 0.55. The reset threshold (the score required to actually reset the counter) is 0.30. This 25-point gap is intentional.
Without the gap, a subject could game the system with minimal compliance — one slightly-below-threshold day every few days to prevent the streak from building. The lower reset threshold requires genuinely decent behavior to clear a flag, not just technically-not-flagging behavior.
In practice: if someone is being bad enough to score 0.54 on their "good" days, the counter doesn't reset. That's still a problem. You need to get to 0.30 — which requires actually being decent.
Beyond consecutive days, two score thresholds are configurable:
--day-threshold N — the BSA score at which a day counts as "bad." Default 0.55. Lower this if you want to catch subtler patterns; raise it if you want to filter out more noise.--confidence-min N — the minimum confidence score to actually flag someone. Default 0.70. Raise this for fewer, higher-certainty flags. Lower it if you want to surface borderline cases for manual review.