Saturday, March 28, 2009

Chapter 13 Microbial Genomics

Microbial Genomics
Chapter 13
A Short History of Genomics
Genome
Entire complement of genetic information
Includes genes, regulatory sequences, and noncoding DNA
Genomics
Discipline of mapping, sequencing, analyzing, and comparing genomes

Prokaryotic Genomes: Sizes and ORF Contents

On average a prokaryotic gene is 1,000 bp long
1,000 genes per megabase (Mbp; 1,000,000 bp)
As genome size increases gene content proportionally increases

Prokaryotic Genomes: Sizes and ORF Contents

Prokaryotic genomes range in size from those of large viruses to those of eukaryotic microbes
Unlike prokaryotes, eukaryotic genomes contain a large fraction of non-coding DNA
Prokaryotic Genomes: Sizes and ORF Contents
Smallest cellular genomes to date belong to parasitic or endosymbiotic prokaryotes
Obligate parasites range from 490 kbp (Nanoarchaeum equitans) to 4,400 kbp (Mycobacterium tuberculosis)
Endosymbionts can be even smaller (e.g., 160 bp genome of Carsonella ruddii)
Estimates suggest minimum number of genes for a viable cell is 250–300 genes

Prokaryotic Genomes: Sizes and ORF Contents

Largest prokaryotic genomes comparable to those of some eukaryotes
Sorangium cellulosum (Bacteria)
Largest prokaryotic genome to date at 12.3 Mbp
Largest Archaeal genomes tend to be smaller (~ 5 Mp)
Prokaryotic Genomes: Bioinformatic Analyses
Bioinformatics
Science that applies powerful computational tools to DNA and protein sequences
For the purpose of analyzing, storing, and accessing the sequences for comparative purposes

Prokaryotic Genomes: Bioinformatic Analyses

Complement of genes in a particular organism defines its biology but genomes are also molded by an organism’s lifestyle

Prokaryotic Genomes: Bioinformatic Analyses

Many genes can be identified by sequence similarity to genes found in other organisms (comparative analysis)
Comparative analyses allow for predictions of metabolic pathways and transport systems

Prokaryotic Genomes: Bioinformatic Analyses
Gene Distribution in Prokaryotes
Metabolic genes typically most abundant class
DNA replication and transcription genes make up minor fraction of genome
Nontranslated RNA genes are typically prevalent
I.e., rRNA, tRNA, small regulatory RNAs

Prokaryotic Genomes: Bioinformatic Analyses
Number of genes with role that can be clearly identified in a given genome is 70% or less of total ORFs detected
Hypothetical proteins: uncharacterized ORFs; proteins that likely exist but whose function is presently unknown
Likely encode nonessential genes
In E. coli, many predicted to encode regulatory or redundant proteins
Prokaryotic Genomes: Bioinformatic Analyses

Inaccuracies in some annotations are problematic
As many as 10% of annotated genes are incorrectly annotated
Percentage of an organism’s genes devoted to a specific cell function is to some degree a function of genome size

Prokaryotic Genomes: Bioinformatic Analyses
Gene Distribution in Bacteria and Archaea
Archaea typically devote a higher percentage of their genomes to energy and coenzyme production than do Bacteria
Archaea contain fewer genes for carbohydrate metabolism or cytoplasmic membrane functions than do Bacteria

The Genomes of Eukaryotic Organelles
Mitochondria and chloroplasts contain a small genome
Also contain the necessary machinery for protein synthesis
Including ribosomes, tRNAs, and all other components necessary for translation formation of functional proteins

The Genomes of Eukaryotic Organelles
Known Chloroplast Genomes
Circular DNA molecules
Typically 120–160 kbp
Contain two inverted repeats of 6–76 kbp
Many genes encode proteins for photosynthesis and autotrophy
Introns common; primarily of self-splicing type

The Genomes of Eukaryotic Organelles
Known Mitochondrial Genomes
Diverse structures; some linear
Typically smaller than chloroplast genomes
Primarily encode proteins for oxidative phosphorylation
Use simplified genetic codes rather than “universal” code
Some contain small plasmids

The Genomes of Eukaryotic Organelles

Many genes in the nucleus encode proteins required for organelle function
E.g., translational machinery, energy generation

Eukaryotic Microbial Genomes
The Haploid Yeast Genome
Contains 16 chromosomes, ranging in size from 220 kbp to 2,352 kbp
Entire genome is ~ 13,392 kbp; encodes ~ 6600 ORFs; ~3,500 encode proteins with known function
At least 877 ORFs are essential at least 3,121 are not
Contains a large amount of repetitive DNA

Eukaryotic Microbial Genomes
Smallest eukaryotic cellular genome belongs to Encephalitozoon cuniculi
Intracellular pathogen
Haploid genome contains 11 chromosomes
Genome size 2.9 Mbp; ~ 2,000 genes
Smallest eukaryotic genome belongs to a nucleomorph
Degenerate remains of a eukaryotic endosymbiont
Ranges in size from 0.45 to 0.85 Mbp
Eukaryotic Microbial Genomes

Largest eukaryotic genome belongs to Trichomonas
Parasite
~ 60,000 genes (nearly twice as many as humans)
Microarrays and the Transcriptome
Transcriptome
The entire complement of RNA produced under a given set of conditions
Hybridization techniques can be used in conjunction with genomic sequence data to measure gene expression
Microarrays
Small solid-state supports to which genes or portions of genes are fixed and arrayed spatially in a known pattern

Microarrays and the Transcriptome
DNA segments on arrays are hybridized with mRNA from cells grown under specific conditions and analyzed to determine patterns of gene expression
Arrays are large and dense enough that the transcription pattern of an entire genome can be analyzed

Microarrays and the Transcriptome
What can be learned from microarray experiments?
Global gene expression
Expression of specific groups of genes under different conditions
Expression of genes with unknown function; can yield clues to possible roles
Comparison of gene content in closely related organisms
Identification of specific organisms
Proteomics
Proteomics
Genome-wide study of the structure, function, and regulation of an organism’s proteins
Two-dimensional (2-D) polyacrylamide gel electrophoresis
Technique for the separation, identification, and measurement of all proteins present in a sample
In first (horizontal) dimension, proteins separated by differences in isoelectric points
In second (vertical) dimension, proteins separated by size

Proteomics
Proteins with > 50% sequence identity typically have similar functions
Proteins with > 70% sequence identity almost certainly have similar functions
Protein domains
Distinct structural modules within proteins
Have characteristic functions that can reveal much about a protein’s role, even in the absence of complete sequence homology

Nucleic Acid and Amino Acid Sequence Similarities

Metabolomics
Metabolome
The complete set of metabolic intermediates and other small molecules produced in an organism
Mass spectrometry is one of the primary techniques for monitoring metabolites
Gene Families, Duplications, and Deletions
Homologous: related in sequence to an extent that implies common genetic ancestry
Gene families: groups of gene homologs
Paralogs: genes within an organism whose similarity to one or more genes in the same organism is the result of gene duplication
Orthologs: genes found in one organism that are similar to those in another organism but differ because of speciation
Gene Families, Duplications, and Deletions
Gene duplications thought to be mechanism for evolution of most new genes

Deletions can eliminate gene no longer needed

Gene analysis in the three domains of life suggests that many genes present in all organisms have common evolutionary roots
Mobile DNA: Transposons and Insertion Sequences
Horizontal Gene Transfer
The transfer of genetic information between organisms, as opposed to vertical inheritance from parental organism(s)
May be extensive in nature
May cross phylogenetic domain boundaries
Mobile DNA: Transposons and Insertion Sequences
Detecting Horizontal Gene Flow
Presence of genes typically found only in distantly related species
Presence of a DNA with GC content or codon bias that differs significantly from remainder of genome
Mobile DNA: Transposons and Insertion Sequences

Horizontally transferred genes typically encode non-core metabolic functions

Horizontal Gene Transfer and Genome Stability
Transposons may transfer DNA between different organisms
Transposons may also mediate large-scale chromosomal changes within a single organism
Presence of multiple insertion sequences (IS)
Recombination among identical IS can result in chromosomal rearrangements
E.g., deletions, inversions, or translocations

Horizontal Gene Transfer and Genome Stability
Integrons
Genetic elements that collect and express genes carried on mobile segments of DNA (cassettes)
Of those known, most carry genes for antibiotic resistance

Evolution of Virulence: Pathogenicity Islands
Chromosomal Islands
Region of bacterial chromosome of foreign origin that contains clustered genes for some extra property such as virulence or symbiosis
Pathogenicity islands: chromosomal islands containing genes for virulence

Evolution of Virulence: Pathogenicity Islands
Chromosomal islands believed to have a “foreign” origin based on several observations
Extra regions often flanked by inverted repeats
Base composition and codon usage in chromosomal islands often differ from rest of genome
Often found in some strains of a species but not others
Evolution of Virulence: Pathogenicity Islands
Chromosomal islands contribute specialized functions not essential to growth
Virulence
Biodegradation of recalcitrant compounds
E.g., hydrocarbons and herbicides
Symbiosis

Evolution of Virulence: Pathogenicity Islands
The “pan”/ “core” concept: bacterial species consist of two components
Core genome: shared by all strains of the species
Pan genome: includes all the optional extras present in some but not all strains of the species

Detecting Uncultured Microorganisms
Metagenome
The total gene content of the organisms present in an environment
Several environments have been surveyed by large-scale metagenome projects
E.g., acid mine run-off waters,deep sea sediments, fertile soils

Viral Genomes in Nature
Viruses are more prevalent than bacteria in the environment
Most are bacteriophages and have populations that turn over rapidly
Most of the genetic diversity on Earth thought to reside in viruses
Most virus genes are uncharacterized and show little or no sequence similarity to known genes

No comments:

Post a Comment