The Synergy of Toxicogenomic and AI: Revolutionizing Drug Safety and Development

Toxicologists traditionally use animals to test the toxicity of chemicals and other substances. But the brand-new field of toxicogenomics, which applies a whole-genome approach to toxicology questions, is changing all that. Still, in its infancy, this field is destined to change the way toxicologists think and act and could even help optimize the drug development process.

“It holds great promise for the future,” says a Michigan State University toxicologist Jay Goodman. “Toxicogenomic is a tool that can improve the assessment of potential toxicity.” Phil Iannaccone, a researcher at Northwestern University Medical School and invited author of a recent Environmental Health Perspective editorial on the new technology, agrees: “The hope is that the observed patterns will be characteristic of a class of toxicants, such as polycyclic aromatic hydrocarbons versus peroxisome proliferators. Eventually, one might hope for specificity allowing actual identification of the chemical,” he says. “For now, it is exciting enough that one might be able to determine if an unknown chemical is likely to behave as a certain class of toxicants or not.”

The new field is based on the premise that tumours, disease, and other physical responses to toxic chemicals find their origins in gene expression, which depends on the environment—chemical or otherwise. Determining how genes respond to a toxicant, then, could be a direct measure of toxicity. It’s faster, cheaper, and more accurate than animal testing, which often takes years to perform; examining gene expression takes only days or months.

That’s welcome news, as over 80,000 substances are being used in commerce, such as drugs, food additives, cosmetics, and chemicals, but only a fraction have been thoroughly tested for toxicity, says Michael McClure, Chief, of Organs and Systems Toxicology Branch, Office of Program Development, National Institute of Environmental Health Sciences (NIEHS). Toxicogenomics could accelerate the speed at which these substances are tested.

Microarrays Fit the Bill

The key toxicogenomicist’s tool is the microarray, which allows scientists to simultaneously assess the expression of thousands of genes and produce a profile, or signature, for each toxin. Like traditional biomarkers, such as blood enzyme levels, microarray profiles can flag substances as hazardous, but this new biomarker provides greater precision in identifying hazards. Each chip’s probe set is manufacturer-dependent (see sidebar), but they typically include genes implicated in DNA replication and repair and apoptosis; transcription factors; signalling molecules; and genes known to respond to various cellular insults.

Toxicologists use these microarrays in several ways. Instead of groping in the dark for a few genes that might be involved in the toxicant response, scientists can use toxicogenomics to get a complete picture of all affected genes and then focus on the promising candidates. This knowledge provides clues as to why animals develop cancer, liver damage, heart problems, or birth defects in response to a toxin.

Researchers can also screen an environment for toxins by assessing the gene expression characteristics of organisms living in that environment. They can measure the gene expression profiles of clinical trial participants to determine a drug’s toxicity, its effects on the body, and the effects of different dosages.

These applications could change the way pharmaceutical companies operate. Drug discovery methods such as combinatorial chemistry have greatly increased the number of drug leads, but taking one compound through the development process often costs millions of dollars. If, during clinical trials, that compound is found to be toxic, the money would be wasted. Pharmaceutical companies can reduce R&D costs by running the compound through microarray testing early in the process.

A Complement, not a Replacement

The NIEHS established the National Center for Toxicogenomics (NCT) in September 2000. According to the Center’s mission statement, its goal is “to use the methodologies and information of genomics science to significantly improve our understanding of basic biological responses to environmental stressors and toxicants.”

NCT director Raymond Tennant explains that toxicogenomics fundamentally changes the way toxicologists carry out their work: “In football, you try to identify the person carrying the ball and tackle him. With global gene expression, you can tackle the entire football team at one time and throw out the players until you find the ball.”

Of course, it helps to have the entire team on one chip. With all kinds of genes assembled on one array, probes can exclude those genes that don’t play a role in the toxic response. However, knowing which genes change their expression in the presence of toxicants is of only limited value. Scientists must be able to correlate these changes with something tangible, such as a tumour. Otherwise, the fact that certain genes become up or down-regulated doesn’t mean much.

“If you take a pesticide and throw it at a cell culture, you’re going to get a response, but you don’t know what the significance of the response is,” says Peter Spencer, director of the Center for Research on Occupational and Environmental Toxicology at Oregon Health Sciences University. Goodman adds, “To make progress, this new tool must be linked to basic principles of toxicology—for example, dose- and time-response relationships—and we need to understand that a simple change in gene expression is not necessarily indicative of toxicity.”

So, researchers will continue to perform animal testing while toxicologists anchor gene expression profiles to actual phenotypic effects. Starting with chemicals already known to cause cancer, a gene expression profile could be correlated with carcinogenicity. Toxicologists could then compare the signature of an unknown chemical to that of known toxicants. A matching signature then offers clues to that compound’s effects.

Bioassays that don’t use animals can also help interpret gene expression profiles, measure toxicant-induced DNA damage, and determine neurotoxicity, immunotoxicity, reproductive and developmental toxicology, and genetic toxicology. The unknown chemical’s signature can help the researcher determine if the toxicant induces a biological response. Then, the most relevant bioassay can be selected. This makes sense, as some bioassays can be extremely expensive and time-consuming, such as the rodent cancer bioassay, which requires four years, 1,200 animals, and millions of dollars to execute and analyze.

A Sea of Data

Analyzing these results is not easy. Microarrays generate giant lists of numbers representing changes in gene expression. “It’s one thing to run an array. That’s the easy part,” Tennant says. “Coming to understand what those data are telling you about is a substantial effort.” That’s where bioinformatics comes in, says McClure. Bioinformaticians can help toxicologists make meaningful conclusions out of the deluge of data points.

However, the trial of wading through the data is nothing compared to the effort it took to study how toxicants affected genes before toxicogenomics came along. At that time, it was, “One gene, one protein at a time,” observes Leona Samson, a toxicologist at the Massachusetts Institute of Technology. Now, “our eyes are being opened to a multitude of responses that are helpful to the cell for recovery.”

Using software to archive, compare, and interpret microarray data, researchers can now begin compiling databases against which new compounds can be tested. For example, Santa Fe, NM-based PHASE-1 Molecular Toxicology Inc., is developing a database that will hold the gene expression profiles of a whole spectrum of chemicals.

The NCT is also building a database. In going forward with that task, the NCT announced in November the formation of the Toxicogenomics Research Consortium. Organized by a NIEHS team led by McClure and Ben Van Houten, the Consortium is a $37 million effort.

To make a useful database that can serve as a universal reference, the researchers must establish standards and fine-tune environmental factors and experimental conditions. They’ll have to consider dosages and finely control how they culture cells and treat animals, including such details as lighting, nutrition and feeding schedules, and handling effects. This will ensure that any changes in gene expression result from the tested chemical and not from the ambient environment.

In the future, toxicogenomics will likely play an important role in personalized medicine, says Erik Jongedijk, marketing and sales director at PHASE-1’s Belgium headquarters. Yet despite its promise, very few companies offer arrays specifically for toxicological applications, and fewer still have developed technology that’s affordable and accessible to individual academic researchers.

Toxicogenomics is a big-ticket science; most of the players are pharmaceutical and biotech companies. But, as Iannaccone points out, “If you have access to a biotech centre, which most universities have now, then the individual investigator absolutely can do this. Most of the interesting work in the past few years can be traced to a single postdoc in the lab that published the work. Generally, this person picked this up as a project without prior experience. They then wind up getting job offers in drug companies and are whisked out of the lab!”

Advancements in whole genome sequencing have ignited a revolution in digital biology.

Photo by Chokniti Khongchum on Pexels.com

Genomics programs across the world are gaining momentum as the cost of high-throughput, next-generation sequencing has declined.

Whether used for sequencing critical-care patients with rare diseases or in population-scale genetics research, whole genome sequencing is becoming a fundamental step in clinical workflows and drug discovery.

But genome sequencing is just the first step. Analyzing genome sequencing data requires accelerated compute, data science and AI to read and understand the genome. With the end of Moore’s law, the observation that there’s a doubling every two years in the number of transistors in an integrated circuit, new computing approaches are necessary to lower the cost of data analysis, increase the throughput and accuracy of reads, and ultimately unlock the full potential of the human genome.

An Explosion in Bioinformatics Data

Sequencing an individual’s whole genome generates roughly 100 gigabytes of raw data. That more than doubles after the genome is sequenced using complex algorithms and applications such as deep learning and natural language processing.

As the cost of sequencing a human genome continues to decrease, volumes of sequencing data are exponentially increasing.

An estimated 40 exabytes will be required to store all human genome data by 2025. As a reference, that’s 8x more storage than would be required to store every word spoken in history.

Many genome analysis pipelines are struggling to keep up with the expansive levels of raw data being generated.

Accelerated Genome Sequencing Analysis Workflows

Sequencing analysis is complicated and computationally intensive, with numerous steps required to identify genetic variants in a human genome.

Deep learning is becoming important for base calling right within the genomic instrument using RNN- and convolutional neural network (CNN)-based models. Neural networks interpret image and signal data generated by instruments and infer the 3 billion nucleotide pairs of the human genome. This improves the accuracy of the reads and ensures that base calling occurs closer to real-time, further hastening the entire genomics workflow, from sample to variant call format to final report.

For secondary genomic analysis, alignment technologies use a reference genome to assist with piecing a genome back together after the sequencing of DNA fragments.

BWA-MEM, a leading algorithm for alignment, is helping researchers rapidly map DNA sequence reads to a reference genome. STAR is another gold-standard alignment algorithm used for RNA-seq data that delivers accurate, ultrafast alignment to better understand gene expressions.

The dynamic programming algorithm Smith-Waterman is also widely used for alignment, a step that’s accelerated 35x on the NVIDIA H100 Tensor Core GPU, which includes a dynamic programming accelerator.

Uncovering Genetic Variants

One of the most critical stages of sequencing projects is variant calling, where researchers identify differences between a patient’s sample and the reference genome. This helps clinicians determine what genetic disease a critically ill patient might have or helps researchers look across a population to discover new drug targets. These variants can be single-nucleotide changes, small insertions and deletions, or complex rearrangements.

GPU-optimized and accelerated callers such as the Broad Institute’s GATK—a genome analysis toolkit for germline variant calling — increase the speed of analysis. To help researchers remove false positives in GATK results, NVIDIA collaborated with the Broad Institute to introduce NVScoreVariants, a deep-learning tool for filtering variants using CNNs.

Deep learning-based variant callers such as Google’s Deep Variant increase accuracy of calls, without the need for a separate filtering step. Deep Variant uses a CNN architecture to call variants. It can be retrained to fine-tune for enhanced accuracy with each genomic platform’s outputs.

Secondary analysis software in the NVIDIA Para bricks suite of tools has accelerated these variant callers up to 80x. For example, germline Haplotype Caller’s runtime is reduced from 16 hours in a CPU-based environment to less than five minutes with GPU-accelerated Para bricks.

Accelerating the Next Wave of Genomics

NVIDIA is helping to enable the next wave of genomics by powering both short- and long-read sequencing platforms with accelerated AI base calling and variant calling. Industry leaders and startups are working with NVIDIA to push the boundaries of whole genome sequencing.

For example, biotech company PacBio recently announced the Revio system, a new long-read sequencing system featuring NVIDIA Tensor Core GPUs. Enabled by a 20x increase in computing power relative to prior systems, Revio is designed to sequence human genomes with high-accuracy, long reads at scale for under $1,000.

Oxford Nanopore Technologies offers the only single technology that can sequence any-length DNA or RNA fragments in real-time. These features allow the rapid discovery of more genetic variation. Seattle Children’s Hospital recently used the high-throughput nanopore sequencing instrument PromethION to understand a genetic disorder in the first few hours of a newborn’s life.

Ultima Genomics is offering high-throughput whole genome sequencing at just $100 per sample, and Singular Genomics’ G4 is the most powerful benchtop system.

Tox-GAN: An Artificial Intelligence Approach Alternative to Animal Studies—Case Study with Toxicogenomic

GANs are a set of unsupervised deep learning approaches that can generate new data with a similar statistical distribution as the real data (Goodfellow et al.,2014). AGAN consisted of 2 models: the generator G and the discriminator D, coordinating with each other. In our case, generator G was trained to generate the transcriptomic profiles at a particular time and dose combination for a compound. Then, the generated transcriptomic profile was passed to discriminator D to compare with the real transcript-to-mic profile at the same treatment conditions (i.e., the same time/dose combination). If the generated transcriptomic profiles were different from the real transcriptomic profiles, the discriminator D would provide feedback to the generator G for improvement. The process will be repeated until discriminator D cannot distinguish the generated transcriptomic profile from real ones. Eventually, the trained generator G could infer the transcriptomic profile at a particular time and dose combination based on the chemical information.

Convergence Between Pharmacy – Data Science and AI

The Synergy of Toxicogenomic and AI: Revolutionizing Drug Safety and Development

Microarrays Fit the Bill

A Complement, not a Replacement

A Sea of Data

An Explosion in Bioinformatics Data

Accelerated Genome Sequencing Analysis Workflows

Uncovering Genetic Variants

Accelerating the Next Wave of Genomics

Leave a comment Cancel reply

Convergence Between Pharmacy – Data Science and AI

Microarrays Fit the Bill

A Complement, not a Replacement

A Sea of Data

An Explosion in Bioinformatics Data

Accelerated Genome Sequencing Analysis Workflows

Uncovering Genetic Variants

Accelerating the Next Wave of Genomics

Share this:

Leave a comment Cancel reply