Skip to main content

Cornell University

3CPG

Cornell Center for Comparative and Population Genomics

Events

September 23, 2024

Seminar title: Seagrass wasting disease ecology in the Pacific Northwest: pathogen strain diversity, transmission, and pathogenesis.

Hosted by Drew Harvell and Andre Dhondt

September 20, 2024

Seminar title: Global geochemical thresholds and the boundaries of soil fertility.

Hosted by Meredith Holgerson, Roxanne Marino, and Christy Goodale.

September 16, 2024

Seminar title: Adaptive plasticity in response to environmental stress: mechanisms and consequences.

Hosted by Maren Vitousek.

September 9, 2024

Seminar title: Drawing as a teaching tool in undergraduate biology classroom, lab, and field.

Hosted by Leslie Babonis

August 30, 2024

Dr. Magnus Nordborg, Scientific Director, Gregor Mendel Institute, Austria Academy of Sciences

August 26, 2024

Seminar title: Why are ponds biogeochemical hotspots? Examining how ecosystem structure and function scale with waterbody size.

Hosted by Bob Howarth.

May 17, 2024

Dr. Adam Siepel, Professor and Chair, Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory

“Probabilistic and machine-learning methods for problems in population genomics and transcriptional regulation”

I will describe my research group’s recent progress in developing computational methods to address two mostly unrelated problems in genomics: inference of selective sweeps from population genomic data and characterization of the dynamics of transcription from nascent RNA sequencing data. In the first part of the talk, I will describe our methods for inferring ancestral recombination graphs (ARGs) from sequence data, and then show how features from inferred ARGs can be used in a neural-network setting to improve not only the detection of selective sweeps but also estimation of selection coefficients and allele frequency trajectories. I will then present a new approach for mitigating the problem of “simulation misspecification” that arises when training neural networks of this kind, by framing it as a problem of “domain adaptation” and using a gradient reversal layer to improve generalization to real data. In the second part of the talk, I will introduce a unified probabilistic model for the dynamics of transcription initiation, promoter-proximal pause escape, and elongation, and the generation of nascent RNA sequencing read counts under steady-state conditions. I will show using simulated data that the approach yields accurate estimation of key rate parameters and correctly identifies epigenomic and DNA-sequence covariates of local elongation rates. Then I will summarize analyses of several publicly available PRO-seq data sets, showing that pause-escape is often strongly rate-limiting, that steric hindrance in the promoter-proximal region can dramatically reduce initiation rates, and that reductions in local elongation rate are associated with cytosine nucleotides, DNA methylation, splice sites, RNA stem-loops, CTCF binding sites, and several histone marks. Finally, I will introduce a convolutional neural network that improves our predictions of local elongation rates. Altogether, the talk will summarize several years of methods development in two important areas of genomics, and insights from applying these new methods to real genomic data.

May 9, 2024

May 8, 2024

The Weill Institute for Cell and Molecular Biology's Career Council invites trainees from across Cornell University to hear from Dr. Marcus Smolka, Ph.D. (Professor, Molecular Biology & Genetics and Interim Director, Weill Institute for Cell and Molecular Biology) as he reflects on his Journey Through Science and what challenges and triumphs throughout his career led him to where he is today. There will be plenty of opportunity for Q&A and refreshments will be provided to all those in attendance.

May 1, 2024

Dr. Magnus Nordborg, Scientific Director, Gregor Mendel Institute, Austria Academy of Sciences

“Towards an unbiased characterization of genetic diversity”

Our view of genetic diversity is shaped by methods that provide an incomplete and highly biased picture, effectively limited to single-nucleotide polymorphism in conserved regions of the genome. Long-read sequencing technologies, which are starting to provide nearly complete genome sequences for population samples, should solve the problem—except that characterizing and making sense of non-SNP variation is difficult even with perfect sequence data. I will describe our attempts to investigate and address this problem using samples of genomes from Arabidopsis thaliana as an example. Our analyses reveal substantial and worrying biases in current data that affect everything from GWAS and functional genomics to population genetics and diversity studies. We also discover exciting new biology, especially when it comes to understanding the evolutionary dynamics of transposable elements. We demonstrate that existing genome annotation tools do not predict mobile elements well even in a model plant and present alternative algorithms.

Twenty-five years ago, technical developments that came out of the Human Genome Project ushered in the SNP era, leading to a revolution of population genetics—as predicted by Aravinda Chakravarti, who noted that we needed models to “make sense out of sequence”. I will argue that we are in an analogous position now, with technologies making it easy to generate complete genomes sequences of almost any species at a population scale—an enormous breakthrough for anyone interested in the full diversity of life. However, to make sense of these data, we will need a modeling framework rooted in population genetics, but which also incorporates accurate mechanistic models of the mutational and recombination processes that ultimately generate genetic variation. This framework remains to be developed.