Key Takeaways
AI language models excel at tasks involving language manipulation - email summarization/generation, transcribing patient histories, etc. These tend to be high reward, low risk use cases.
More complex tasks like automated medical coding, data extraction, brainstorming differentials carry higher risks of bias, false information, and over-reliance on "black box" outputs. Careful design by experienced AI teams is critical.
Emerging use cases, like Q&A over patient records, are promising but not ready for widespread use. Limitations around data formats, input size, and output quality need to be addressed.
Practitioners should feel empowered to adopt AI tools, but must critically evaluate each tool's strengths, limitations, and team behind it. Don't assume AI performance - verify it!
Language Tasks: Email, Histories, Scribing
Large language models like GPT excel at what they were designed for - predicting the next most likely token (word) in a sequence. By chaining these predictions, they can:
Take medical jargon and generate a client-friendly summary
Listen to a verbal patient history and produce a concise written version
Auto-generate medical record notes (SOAP) from an appointment transcript
These tend to be high reward (save time, improve communication) and low risk, as the vet has full context to fact-check outputs. Some risk of bias creep, but a net positive in most cases.
Categorization, Extraction & Differential Diagnosis
While it's tempting to throw a complex patient record or clinical question at a language model, the results can be misleading.
These models aren't designed for complex categorization (determining record sections), granular data extraction, or differential diagnosis. They'll confidently produce an output, but it may be biased, over-generalized, or just plain wrong.
Using separate ML models designed for the specific task (and trained on vet data) is critical here. Exposing sources and explaining the model's logic also helps vets gauge when to trust vs override the AI.
Q&A on Patient Records - Promising but Early
The holy grail is a system that can ingest a complex patient history (even long PDFs), intelligently index the information, and let the vet ask plain-language questions to quickly get answers.
This is an active area of research, with limits around:
Data format (can it read the EMR/PIMS data accurately?)
Size (14 page cap today)
Question complexity (can it break down and accurately answer multi-part questions?)
I'd be very cautious about claims here and probe into the team's experience, approach to testing and validation, ability to show sources, etc. Lots of potential, but easy to do wrong in ways that create unacceptable risks.
Empowering Practitioners
The pace of AI development is breathtaking, and it's easy to oscillate between hype and fear. My goal is to empower practitioners to critically evaluate both the potential and pitfalls of AI tools.
Don't accept AI performance at face value - look at the team behind the tool, their domain experience, and approach to validation and continuous improvement. Probe into risks around bias, false outputs, and over-dependence.
But don't let caution turn into paralysis - the upside of improved efficiency, consistency, and quality of care is real. It just requires a measured, eyes-wide-open approach to separate value from vaporware.
Hopefully this helps provide a framework - let me know what other topics would be helpful to cover as we continue the series! And if you're doing something interesting with AI in vet med, I'd love to hear about it.
Video Deep Dive
Feel free to watch the original video presentation and leave questions as comments on the video! I'll make sure to answer in subsequent videos.