Trevino and Ra say that understanding and trusting what’s going on in the underlying model is especially important in research-specific LLMs. One challenge, says Trevino, is to “open up that black box a little bit” to understand better why it answers in the way it does. This could help to minimize hallucinations.

Indeed, one of Genentech’s motivations for building its LLM from scratch, Ra says, is that it wants to know it can trust and understand every bit of data that goes into it. “That’s incredibly important in an environment where we’re often dealing with privileged information or very sensitive information”, such as patient data, he says.

With off-the-shelf, ‘black box’ LLMs, it isn’t always clear how they are trained, Ra explains. “I think this has been a common criticism of some of the commercial LLM solutions, that oftentimes there’s not enough data transparency.”

Another persistent challenge, as in the field of LLMs as a whole, is bias in the underlying data. Groups that are under-represented in the training data will be misrepresented by the resulting model, and current genomic data hugely over-represent people of European descent. The solution, say Trevino and Vijay, is to improve the diversity of the underlying data. But there’s not really an endpoint for when the underlying data is sufficiently diverse, they say.

Should these challenges be overcome, however, “there are going to be very real benefits” to these types of model, Trevino says. The important thing is “to make sure that that benefit is realized and maximally democratized,” and that the gain is worth all the work still left to do.