What Computer Vision Can (and Still Can't) Do in Food Sorting

The pitch usually goes like this: a vision systems rep walks your production floor, points at the line, and says "we can catch everything." Then they quote you $180,000 for hardware, installation, and the first year of software. You sign, wait three months, go live — and discover that "everything" had some important asterisks.

We've talked with enough QA managers at food manufacturers to know this story is common. The problem isn't that camera-based inspection is ineffective. It's that the technology's actual capability envelope is frequently misrepresented, and buyers end up with expensive systems that cover some of what they need and conspicuously miss the rest.

So here's an honest account of where current computer vision reliably delivers, where it struggles, and what the gap between the two means for how you should spec your inspection setup.

What Vision Systems Actually Do Well

Start with the strong cases, because they're genuinely impressive when conditions are right.

Surface color anomalies. This is probably the strongest application of camera-based food inspection today. A trained model running on properly calibrated line lighting can detect bruising on produce, mold spots on bakery products, and discoloration on poultry with high consistency at speeds exceeding 600 units per minute. The key qualifier is "properly calibrated" — spectral lighting that's appropriate to the product's natural color range matters enormously. Under generic white fluorescent lighting, a bruise on a Fuji apple looks very different than the same bruise under 850nm near-infrared. The camera doesn't care about the difference; the trained model does.

Dimensional and positional checks. Is the label in the right place? Is the fill level visually correct? Is the cap seated properly? These are geometric questions, and geometry is something vision systems handle well. They can measure label offset to within ±0.5mm on a well-lit, consistent-speed line. They can detect when a package hasn't been filled to the expected headspace. They can flag caps that aren't fully seated. The underlying math here is straightforward: reference template comparison and edge detection, not the more complex semantic understanding required for defect classification.

Print quality and barcode presence. Smeared date codes, missing lot numbers, unreadable barcodes — these are highly reliable detection targets because the expected output is precisely defined. A date code is either present and legible, or it isn't. A barcode either decodes cleanly, or it doesn't. Vision systems that are integrated with OCR and barcode reading modules can catch these at essentially any production speed if the camera resolution is matched to the character height.

Packaging structural defects. Torn corner seals, unsealed flaps, collapsed or crushed packaging — again, geometric problems with well-defined failure modes. These are detectable at high reliability when the failure changes the product's silhouette or edge profile in a way the model has been trained to recognize.

Where the Technology Still Has Real Limits

None of the limitations below mean vision inspection doesn't work. They mean you should understand what you're buying before you buy it.

Subsurface defects. A camera sees surfaces, not interiors. Underfill in an opaque container, foreign objects embedded in a product, bacterial contamination with no visible surface expression — none of these are camera-detectable. This sounds obvious, but it's routinely oversold. If you need interior defect detection, you need X-ray or ultrasonic inspection, not a camera array. We're not saying cameras and X-ray are in competition; they're complementary. But a vision system sold as a contamination solution when your primary contamination concern is embedded foreign objects is the wrong tool.

Highly variable natural products. A camera model trained to flag bruising on a Granny Smith apple will have a harder time generalizing to a Honeycrisp under the same conditions. Produce color and surface texture variation across cultivar, season, and growing region is significant. Models built on insufficient training data — fewer than 300–500 images per defect class per product SKU — will generate high false-reject rates that make the economics of the system questionable. This isn't a fundamental barrier, but it does mean that the training data collection process matters as much as the camera hardware.

Defects that are context-dependent. Some defects are only meaningful in relation to other information. A correct label on the wrong product is a mispack — visually undetectable by a label-checking system unless it also reads and validates the SKU against an order manifest. Underweight product looks normal to a camera unless you've integrated a checkweigher feed. Vision inspection is good at catching physical anomalies; it's not good at validating that the right product is in the right package without additional data integration.

Novel defect types with no training data. This one comes up in contract packing environments more than anywhere else. A contract packer running 20 different SKUs for different customers may encounter a defect type on a new product that the model has never seen. Without line stop time for training data collection, the system will either miss the defect or require a conservative threshold that drives false rejects up. This isn't a bug — it's how supervised learning works. But it means you need a clear plan for model maintenance and retraining as your product set changes.

The Training Data Problem Is Often the Real Bottleneck

When vision inspection deployments underperform, the hardware is rarely the issue. More often, it's the training data. Consider a scenario we encountered working with a Midwest contract packer running a poultry line at 480 units per minute: the system was spec'd correctly, the lighting was well-designed, and the GigE Vision cameras were matched to the conveyor geometry. But the initial model was trained on 80 defect images collected over two shifts, because that's all the customer had available before go-live.

Eighty images per defect class is almost never enough for a production-grade model on food products. The defect type — in this case, torn packaging film at the seal edge — varied enough across pack sizes and film batches that the model's confidence threshold had to be set so low to catch real defects that false rejects ran at 3.2% for the first six weeks. At 480 units per minute, that's roughly 920 good units rejected per hour.

The solution was additional data collection during production — deliberately inducing defects on pulled units and capturing them under production lighting — combined with synthetic augmentation to increase positional variety. After retraining on a 340-image dataset, the false reject rate dropped to under 0.4%. The hardware hadn't changed.

Lighting Is Not an Afterthought

Lighting decisions made during installation have more impact on long-term detection performance than camera selection in most food inspection applications. The physics is straightforward: a machine learning model running on image data can only work with what the image contains. If the lighting creates shadows that vary by product thickness, if it creates specular reflections that obscure surface texture, or if the color temperature shifts across the day as ambient factory lighting changes, the model will degrade in ways that are hard to diagnose without careful logging.

The standard approach in well-deployed systems is controlled, closed-cavity illumination — a light tunnel around the camera field of view that eliminates ambient light interference. Combine this with product-appropriate spectral choice (coaxial illumination for flat-label inspection, diffuse dome lighting for produce surface inspection, backlit illumination for silhouette checks on transparent packaging) and you get consistent image data that a model can actually learn from.

What we're not saying here is that all food manufacturers need to buy a $40,000 custom light enclosure. For many applications, a well-positioned LED ring or bar light is sufficient. The point is that lighting requirements should be driven by the inspection problem, not selected from a default hardware spec sheet.

A Reasonable Expectation Framework

If you're evaluating vision inspection for the first time, here's how to frame your expectations against the technology's real capabilities:

Camera inspection is reliably good at: surface color defects on uniformly-lit products, label and print quality checks, dimensional and positional measurements, seal geometry verification, and high-confidence absence/presence checks (cap, label, fill indicator).

Camera inspection requires careful setup and sufficient training data for: biological surface defects on naturally variable products, multi-defect simultaneous detection on complex packaging, and any application where defect appearance varies significantly by product SKU or lot.

Camera inspection cannot replace: metal detection, X-ray inspection for internal contaminants, checkweighing, or any inspection requirement based on properties that have no visible surface expression.

The companies that get the most value from vision inspection are not necessarily the ones with the highest-resolution cameras or the most sophisticated models. They're the ones who've defined their inspection problem clearly — specific defect types, specific line speeds, specific product variation range — before selecting hardware, and who've budgeted for the training data effort that makes the model actually work.

That framing is harder to put in a sales pitch than "we catch everything." But it's the conversation that leads to systems that work.

What Vision Systems Actually Do Well

Where the Technology Still Has Real Limits

The Training Data Problem Is Often the Real Bottleneck

Lighting Is Not an Afterthought

A Reasonable Expectation Framework

See Foodtrce on your line.