One way we've seen the PHI problem avoided is generating synthetic equivalents of the data you're trying to train on. So when you have an output table like you showed with age/address/etc., you can start with those variables and then use LLMs to generate the transcripts/reports that would reflect that patient.
This doesn't help you gain new information from data, but it does help you build a model better able to do complex medical NER style tasks like the ones you were mentioning!
One way we've seen the PHI problem avoided is generating synthetic equivalents of the data you're trying to train on. So when you have an output table like you showed with age/address/etc., you can start with those variables and then use LLMs to generate the transcripts/reports that would reflect that patient.
This doesn't help you gain new information from data, but it does help you build a model better able to do complex medical NER style tasks like the ones you were mentioning!