Computational Aspects of Biological Information 2016

Speaker: Johan Paulsson

All processes in single cells involve components present in low numbers, creating spontaneous fluctuations that in turn can enslave the components present in high numbers. In the first half of the talk I will discuss some mathematical frameworks we developed to analytically predict and analyze random fluctuations in complex processes. The second half of the talk will focus on experimental methods to quantify dynamics in cells, as well as examples of microbial systems where fluctuations play a large role, including feedback control of replication, cell fate decisions, epigenetic oscillations and DNA repair.

Speaker: Bill Hahn

Although we now have a draft view of the genetic alterations that occur in human cancer, the number of mutations found at low frequency and the molecular heterogeneity of most cancers makes identifying genes that contribute to cancer phenotypes challenging. Determining the function of genes altered in cancer genomes is essential to develop new therapeutic approaches. To complement these genome characterization studies, we have used genome scale gain and loss of function approaches to identify genes required for cell survival and transformation. Specifically, we have performed systematic studies to interrogate rare alleles found altered in cancer genomes and used advances in synthetic gene synthesis to prospectively interrogate all possible alleles of known cancer genes. In parallel, we have performed both genome scale RNAi and CRISRP-Cas9 screens in more than 500 cell lines to identify differentially essential genes and the context that specifies gene dependency. This approach now permits us to identify and classify cancer dependencies. These studies allow us to begin to define a global cancer dependencies map.

Speaker: Ben Raphael

Cancer is an evolutionary process driven by somatic mutations that accumulate in a population of cells. In this talk, I will describe several algorithms to reconstruct this process from DNA sequencing data of tumor samples. These algorithms address challenges that distinguish the cancer genome phylogeny problem from classical phylogenetic tree reconstruction. I will demonstrate the application of these algorithms to sequencing data from multiple cancer types.

Speaker: George Church

Exponential technologies such as Expansion Fluorescent In Situ Sequencing (FISSeq/ExSeq) enable computational analysis of multicellular organs including synapse-level-resolution of connectome and transcriptome plus nucleosome-level-resolution chromosome chain tracing in situ (via Oligopaints). We can also computationally design and build whole genomes (via synthesis and recombination) and epigenomes (via comprehensive transcription factor libraries). Data from the IARPA MICrONS BRAIN project is aimed at new insights into visual machine learning strategies.

Speaker: Leslie Valiant

Living organisms function according to protein circuits. Darwin’s theory of evolution suggests that these circuits have evolved through variation guided by natural selection. However, the question of which kinds of circuits can so evolve in realistic population sizes and within realistic numbers of generations has remained essentially unaddressed.

We suggest that computational learning theory offers the framework for investigating this question, of how circuits can come into existence adaptively from experience, without a designer, or be then maintained. We formulate evolution as a form of learning from examples. The targets of the learning process are the functions of highest fitness. The examples are the experiences. The learning process is constrained so that the feedback from the experiences is Darwinian. We formulate a notion of evolvability that distinguishes function classes that are evolvable with polynomially bounded resources from those that are not. The dilemma is that if the function class, in particualr of the expression levels of proteins in terms of each other, is too restrictive, then it will not support biology, while if it is too expressive then no evolution algorithm will exist to navigate it. We shall review current work in this area.

Speaker: Ben Neale

With the advent of sequencing technology and ever increasing genome-wide association datasets, we tools to meet the challenges of scale. Here I will describe our efforts to develop a software package, hail, that is built using spark and scala. Hail leverages a distributed model of computing to perform scalable sequence data quality control and analysis. In doing so, we can perform primary quality control analyses on whole genome sequencing datasets of ~5,000 individuals in under an hour. Using hail, we have performed analyses of education attainment on a sample of over 14,000 individuals, identifying a clear role of ultra-rare disruptive mutations in the genetic architecture. We further explored the consequences of this class of variation across a wide range of traits and demonstrate that neuropsychiatric traits appear to have a directional burden effect in contrast to later onset systemic disease.

Speaker: David Gifford

With the advent of multiplexed DNA oligo synthesis, CRISPR genome editing, and high-throughput sequencing it is now possible to characterize genome function with experiments that directly observe the effect of sequence variants. We will discuss the computational design and analysis of sequence variants that have a causal effect on the binding of DNA regulatory proteins and proximal gene expression. New results include the observation of regulatory elements in regions of the genome with no observed epigenetic marks.

Speaker: Martha Bulyk

Sequencing of exomes and genomes has revealed abundant genetic variation affecting the coding sequences of human transcription factors (TFs), but the consequences of such variation remain largely unexplored. We developed a computational, structure-based approach to evaluate TF variants for their impact on DNA-binding activity and used universal protein binding microarrays to assay sequence-specific DNA-binding activity across 41 reference and 117 variant alleles found in individuals of diverse ancestries and families with Mendelian diseases. We found 77 variants in 28 genes that affect DNA-binding affinity or specificity and identified thousands of rare alleles likely to alter the DNA-binding activity of human sequence-specific TFs. Our results suggest that most individuals have unique repertoires of TF DNA-binding activities, which may contribute to phenotypic variation.

Computational Aspects of Biological Information 2016

The role of fluctuations in individual cells

Systematic identification of cancer targets

Cancer Genome Evolution

Reading & Writing Omni Omics In Situ

Darwinian Evolution as Learning

Scaling up genetic analysis

Mapping the regulatory genome

Genetic variation in human transcription factors