How PlantLab Knows When It Might Be Wrong: The reliability_score Field

The Short Version

PlantLab's API now returns a reliability_score field on every diagnosis. A number from 0 to 1 telling you how likely the answer is to be correct on this specific image. It replaces the old diagnostic_confidence and safety_classification fields, which were rule-based guesses that I never trusted. The new score is much better at flagging the diagnoses that turn out to be wrong – especially on the hard cases, which is where you actually need it. Schema bumped from 1.x to 2.0.0. If you're integrating with PlantLab today, the migration is a one-line change.


The problem with “confidence” fields

Most diagnosis APIs return a confidence number along with each answer. PlantLab did too. For every condition the model spotted, the response included a confidence value between 0 and 1. On top of that, the response also carried two derived fields. diagnostic_confidence, a single overall trust number, and safety_classification, a three-way bucket of high, moderate, low.

Those derived fields were a heuristic. A small handful of rules that mostly looked at the top condition's confidence and rolled it up into a number. Heuristics work fine when the problem is simple. They fall apart when the failure modes are subtle.

In real traffic, the cases that matter are the ambiguous ones – photos where the answer isn't obvious from the image alone, and a single rule isn't enough to capture how confident the diagnosis really is. That's the slice where a trust signal earns its keep, and the slice where a rule-based composite tends to break.

A trust signal that works on the easy cases and stops working on the harder ones isn't really a trust signal. It's a confidence display.


What reliability_score does differently

reliability_score is a single number from 0 to 1 that estimates how likely the top diagnosis is to be correct on this specific image. Higher is better. Below 0.3 is a clear “double-check this one.” Above 0.7 is “the system is confident and the confidence holds up.”

It doesn't replace per-class confidence. Those still tell you how strongly the model picked each individual condition. What reliability_score adds is a separate answer to a different question – “is the entire diagnosis trustworthy on this particular image, or is something off?”

The analogy I keep coming back to: a junior diagnostician who always gives an answer, and a supervisor who looks over their shoulder. The supervisor doesn't redo the diagnosis. They judge whether each one looks trustworthy. The old diagnostic_confidence was a checklist the junior filled in themselves. reliability_score is the supervisor.

I held the new score to a higher bar than the old composite. On the ambiguous cases, it does a much better job of flagging the answers you should double-check before acting on them. On the easy cases, both fields agree – which is the only place they were ever going to agree, and not where the score earns its keep.


What changes in the response

If you're integrating with PlantLab today, here's what your code currently sees:

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "schema_version": "1.2.0",
  "success": true,
  "is_cannabis": true,
  "is_healthy": false,
  "growth_stage": "flowering",
  "conditions": [
    { "class_id": "magnesium_deficiency", "confidence": 0.85 }
  ],
  "diagnostic_confidence": 0.85,
  "safety_classification": "high_confidence"
}

After the upgrade, that same image returns:

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "schema_version": "2.0.0",
  "success": true,
  "is_cannabis": true,
  "is_healthy": false,
  "growth_stage": "flowering",
  "conditions": [
    { "class_id": "magnesium_deficiency", "confidence": 0.85 }
  ],
  "reliability_score": 0.91
}

Two fields removed. One field added. The rest of the response is identical.

reliability_score is omitted when the API doesn't return a condition diagnosis – for example, when the photo isn't of cannabis, or when the plant is healthy. In those cases, there's no diagnosis to score for reliability, so the field doesn't appear. Treat its absence as “no score available” rather than “low score.”


Migration

The change you make depends on what you were doing with the old fields.

If you were displaying diagnostic_confidence to a user, swap to reliability_score. The semantics are the same direction (higher is better, both 0-1), and the new value is more accurate.

If you were branching on safety_classification strings, pick thresholds on reliability_score instead. A reasonable starting point: above 0.7 is “Confident,” 0.3 to 0.7 is “Uncertain,” below 0.3 is “Low confidence.” Your application can use whatever cutpoints make sense – the score is a number, not a string, so you have full flexibility.

If you were ignoring the old fields entirely, the upgrade is automatic. Remove your code that references diagnostic_confidence or safety_classification (it'll get null going forward) and you're done.

The Home Assistant integration shipped a new release the same day as the API change, so existing HA users get the new sensor automatically. If you're using a custom integration, update it before the next API deploy if you can – sensors that read the removed fields will return null until the integration is updated.


Why a breaking schema, not deprecation

I considered keeping diagnostic_confidence and safety_classification as deprecated fields, returning the old values alongside the new score for a release or two. It would have spared everyone a migration step.

But it forces consumers to choose between two trust signals that can disagree. The old composite says “low confidence” on a photo where the new score says 0.95 – which do you trust? Worse, deprecated fields stick around for months, and integrators keep reading them instead of migrating. That's basically the entire failure mode of deprecation.

Cleaner break, single migration, no ambiguity. Schema bumped to 2.0.0 to make it loud. If your integration was on schema 1.x, you'll start getting 2.0.0 responses the next time you call the API. Field changes are documented above.


What's next

reliability_score ships as v1. The field semantics stay stable: a 0 to 1 trust score, present on diagnoses that returned a condition prediction. Future improvements land behind that contract. Same field, more accurate values, no code changes on your end.

If you migrate now, you're done with the migration.


PlantLab is free to try at plantlab.ai. Three diagnoses a day, results in milliseconds. The full API documentation, including the OpenAPI spec, lives at plantlab.ai/docs.


FAQ

Do I have to migrate immediately?

You'll start receiving schema 2.0.0 responses the next time you call the API. If your code reads diagnostic_confidence or safety_classification, those reads will return null. If your code branches on those fields, your branches will fall through to whatever default path you wrote. So the migration urgency depends on what your code does with null values – some integrations will degrade gracefully, others will break.

Is reliability_score the same as confidence?

No. confidence (still present in conditions[] and pests[]) is the model's per-class probability for one specific class – “how confident am I that this leaf shows magnesium deficiency?” reliability_score is a separate signal that estimates how likely the entire diagnosis is to be correct on this image. The two answer different questions, and you can use both.

What does it mean when reliability_score is missing?

The score is only computed when the API returns a condition diagnosis – that is, when the photo is cannabis and the plant is unhealthy. For non-cannabis photos or healthy plants, there's no condition prediction to score, so the field is omitted. Treat absence as “no score available,” not as a low score.

How is this different from just thresholding on confidence?

Per-class confidence values are the model's individual outputs. They tell you which classes were predicted strongly. They don't tell you whether the diagnosis as a whole holds up on a given image. reliability_score answers that broader question, which is usually the one you actually have.

Can I see PlantLab's diagnosis history for my key?

GET /usage returns daily and monthly counts. For per-request lookup, store request_id from each diagnose response – it's stable, returned in both the JSON body and the X-Request-ID header. Use it for support tickets and feedback submission.


Related reading:The Work Nobody Sees: How I Ran 47 Experiments to Make PlantLab's AI Better – What goes into making the model more accurate, cycle by cycle – Yellow Leaves, Seven Suspects: How PlantLab Got Specific About Nutrient Deficiencies – The nutrient subclassifier that ships alongside this trust signal – How PlantLab's AI Diagnoses 31 Cannabis Plant Problems in 18 Milliseconds – The full pipeline