ML in the wild: 55k‑row system

A detailed engineering thread walked through building a production ML system from 55,000 patient CSV rows — they fixed noisy labels with clinical rules, reached 95.4% accuracy using logistic regression, and deployed with MLflow, DVC, FastAPI and Kubernetes on AWS EKS. (Social briefing: production ML thread) (x.com).

Clinical datasets often contain inconsistent or incorrect labels because records were entered for billing, notes, or different workflows rather than machine learning; fixing those label errors by codifying simple, evidence‑based checks (for example: “if lab X > threshold OR diagnosis code Y present, mark as positive”) changes which examples the model learns from and can substantially improve downstream performance. (nature.com) Putting a trained prediction into everyday use usually needs four things: a way to record every experiment and its metrics so you can compare attempts, a way to store exactly which version of the data and code produced a model, a web service that accepts inputs and returns predictions, and infrastructure that runs and scales that service reliably. (mlflow.org) (doc.dvc.org) Operationally, “clinical rules” here means explicit if/then logic derived from clinician knowledge or guidelines that is run over the raw rows to correct or standardize labels before training — this is a rule‑based data‑cleaning step rather than an automated relabeling model, and it’s a common strategy in healthcare pipelines to reduce errors introduced by automated label extraction. (glass.health) (academic.oup.com) A logistic regression model is a linear statistical classifier that outputs a probability for each outcome and is popular because its coefficients map directly to feature contributions, making the model easier to interpret and fast to train on tens of thousands of rows; reporting “95.4% accuracy” means the model’s predictions matched the cleaned labels that fraction of the time, but accuracy can be misleading if the dataset has imbalanced classes (in which case recall, precision, or AUC are often reported too). (scikit-learn.org) (developers.google.com) On the engineering side the usual wiring is: track experiments and register the chosen model with an experiment platform so you can reproduce and roll back runs, keep the CSV and feature pipeline under data version control so the exact training inputs are preserved, package the model as a REST endpoint for real‑time scoring, and deploy that endpoint in containers orchestrated by Kubernetes so it can be updated and scaled; the open tools named in the thread each serve one of those steps (experiment tracking, data versioning, API layer, and managed Kubernetes). (mlflow.org) (doc.dvc.org) (fastapi.tiangolo.com) (docs.aws.amazon.com) I tried to open the original thread at the X link you provided but public/full content was not reachable from my search, so the above expansion relies on the card’s summary plus the published literature and official tool documentation for how clinical rule cleaning, logistic regression scoring, and the MLflow/DVC/FastAPI/EKS stack are typically used. (x.com)

ML in the wild: 55k‑row system

Get your own daily briefing