script.txt

   1 Thank you xy for the kind introduction.
   2 Hi! Welcome to the biology section of LinuxConf.AU. If you want to learn about
   3 how new antibiotics are discovered, you've come to the right auditorium.
   4
   5 I'm going to present antiSMASH, the software I'm developing as a Ph.D. project.
   6 It's open source software under the GNU GPLv3 (or later) and we're also running
   7 a public instance for the scientific community to use.
   8
   9 But before I start talking about the software I'm working on, let me give you a
  10 short primer on the biology side of things. Without that background, the rest
  11 of the talk will be much harder to follow. Feel free to interrupt with
  12 questions at any time.
  13
  14 As you might have seen on the first slide, I work in the Division for
  15 Microbiology/Biotechnology at the Microbiology Institute of the University of
  16 Tübingen, Germany. So, biotechnology, what is this all about?
  17
  18 The United Nations "Convention on Biological Diversity" defines biotechnology
  19 as "Any technological application that uses biological systems, living
  20 organisms, or derivatives thereof, to make or modify products or processes for
  21 specific use". Quite a mouthful. But let me use a metaphor to build my
  22 explanations on.
  23
  24 In biotechnology, we use biological systems such as bacteria or yeast, and then
  25 turn them into little factories to produce things we want. A popular example
  26 would be... beer. It's one of the oldest biotech applications on the planet. We
  27 use a certain kind of yeast (Saccharomyces cerevisiae) to turn sugar into
  28 alcohol and carbon dioxide. Another widespread example is the use of a
  29 bacterium (Escherichia coli) to produce human insulin to treat people suffering
  30 from diabetes.
  31
  32 Now, what's so nice about using those tiny organisms to produce these
  33 substances instead of going for an all-chemical full synthesis? Well, the first
  34 is that in some cases, like yeast producing ethanol, nature has already built
  35 that functionality into the organism. It's much easier to just let the yeast do
  36 it's thing that it would be to do the synthesis from scratch.
  37
  38 Using bacteria to produce human insulin is a different story. The bacteria
  39 involved don't naturally produce insulin, they were engineered to do so.
  40 However, there's another reason we're using biological systems to produce
  41 things. Unlike a real factory, bacteria are self-reproducing. So if you provide
  42 enough food, a tiny amount of starter bacteria will multiply, and then you have
  43 a lot of little factories running your production line. This turns out to be
  44 much more efficient than harvesting animal insulin from pigs or other large
  45 animals.
  46
  47 As in a big factory, our little biofactories need machines to build their
  48 products. In biology, these machines are called enzymes. Some of these perform
  49 the complex chemical reactions needed to build up products. Others act as
  50 sensors that tell the cell about it's environment. Regulators act on the input
  51 from these sensors and allow the cell to adapt to changes or find food. Last
  52 but not least, there's special machines that build new machines. Those are
  53 called ribosomes, and we'll have a closer look at them in a minute.
  54
  55 Because living organisms need to keep up with an ever-changing environment,
  56 nature provided them with a wide variety of tools. It would not be efficient to
  57 keep all those machines around even if they're unused. Instead, the cells only
  58 carry the blueprints for the vast array of machines they can build. When the
  59 cell needs a specific machine, it will select a blueprint, copy it, and then
  60 build the machine it needs. The biological term for such a blueprint is "gene".
  61
  62 Using the instructions stored in a gene, the ribosomes build up other molecules
  63 called proteins. Proteins that perform some sort of chemical reaction are the
  64 enzymes I was talking about a bit earlier. Ususally, if your focus is on what
  65 the thing is made from, you'd call it a protein, and if your focus is on the
  66 function, you'd say "enzyme". So let's have a look at how proteins are made.
  67
  68 As mentioned before, the instructions on how to build a protein are stored in a
  69 blueprint, the gene. Genes are encoded on nature's universal storage system, a
  70 molecule called "desoxyribonucleic acid", or in short DNA. DNA was discovered
  71 in 1869 by Friedrich Miescher at the University of Tübingen, in this lab in the
  72 basement of the Castle of Tübingen.
  73
  74 DNA consists of a linear backbone (the desoxyribose). This backbone carries the
  75 actual information-containing molecules, the nucleobases or bases in short.
  76 There are four different bases in DNA, adenine, thymine, guanine and cytosine,
  77 abbreviated as A, T, G, and C respectively. DNA turns out to be an efficient
  78 and robust storage for information. This is partly because in nature a DNA
  79 strand always comes together with a backup copy, the so-called complement
  80 strand. The complement strand is an inverse copy of the original strand, with
  81 adenine being complemented by thymine and guanine being complemented by
  82 cytosine. Even if only one of the strands is present, this can be used to
  83 recover the complete set of information. The two DNA strands usually wind
  84 arournd each other in the twisted double helix you usually see when people talk
  85 about DNA.
  86
  87 In bioinformatics, you usually only store one strand because calculating the
  88 complement strand is trivial. So all you need to store is a (potentially pretty
  89 long) sequence of As, Ts, Gs and Cs. To give you a rough number, a virus is
  90 about 15000 bases or 15kb in size, a bacterium is in the low Megabase range,
  91 and a human has about 3 Gb worth of genome.
  92
  93 Genes are encoded with this four letter alphabet using a word size of three. In
  94 biology these words are called codons. This means that there are four to the
  95 power of three, or 64 possible encodings. However, nature builds proteins from
  96 only 20 different ammino acids, the building blocks all proteins are made from.
  97 And because it would be a shame to let the remaining 44 encodings go to waste,
  98 multiple different codons encode the same ammino acid. This is called a
  99 "degenerate" code and adds even more protection against changes to DNA. The
 100 translations of codons into the corresponding amino acids often visualized in a
 101 codon wheel, like this. Going from the center to the out side, we can see for
 102 example that A-T-G encodes for Methionine. The three special cases are TGA, TAA
 103 and TAG, all three telling the ribosome to stop.
 104
 105 Once a cell decides it needs a specific machine, it makes a copy of the
 106 gene and sends it to a ribosome to build a new proteins The copy is made
 107 from ribonucleic acid or RNA in short. It is similar to DNA but has some
 108 chemical differences to the backbone and one of the nucleobases, but those
 109 aren't really important for this part of my talk. What is important is that in
 110 contrast to DNA, RNA usually does not come with a complement strand.  This
 111 means that it's usually less stable, but much easier to process.
 112
 113 Because the RNA copy of a gene is used to tell the ribosome what to produce, it
 114 is called messenger RNA, or mRNA. The flow of information from DNA to mRNA to
 115 protein is called the central dogma of molecular biology. For a long time it
 116 was believed to be the absolute rule at the foundation of the flield. Of
 117 course, like for all absolutes, there's always an exception. Still, it's a good
 118 rule of thumb to go by.
 119
 120 Blueprints that are usually read together are often stored close to each other
 121 on the genome. These genes are said to be in a gene cluster. A common way to
 122 illustrate how the genes are organized in a cluster is this kind of picture,
 123 where the genes are coloured arrows. The arrow directions show which DNA strand
 124 each gene is encoded on. Remember, DNA comes in two strands, and one is acting
 125 as the backup copy of the other. There is no clear distinction which strand is
 126 the original and which the backup, both strands carry blueprints and backups.
 127
 128 The processes required by a cell to carry on living are called the metabolism.
 129 The metabolism is all about feeding, growing and reproducing. Central parts of
 130 it are present in pretty much every living organism. Because living means
 131 running the metabolism, it's going on all the time. When yeast is eating sugar
 132 under low-oxygen conditions, any ethanol it produces is actually a waste
 133 product. So if you're drinking a beer, you're acutally recycling what a yeast
 134 cell would consider toxic waste.
 135
 136 Many microorganisms and plants also have something called the secondary
 137 metabolism.  Opposed to the basic or primary metabolism, the secondary
 138 metabolism deals with building up substances that are not strictly required for
 139 living. Examples include substances like pigments that colour the petals of
 140 flowers. If the plant would be unable to produce a pigment, it wouldn't die
 141 right away. The same applies for the secondary metabolites that I'm interested
 142 in professionally: antibiotics.
 143
 144 Many antibiotics are produced by bacteria. About 70% of the antibiotics on the
 145 market are produced by Streptomycetes. When grown on agar plates, they form
 146 these wrinkled colonies that often have colored pigments. Streptomycetes also
 147 produce the molecules people usually associate with the smell of earth on a
 148 freshly tiled field. Because these bacteria are such important producers of
 149 antibiotics, we're focusing much of our work on them.
 150
 151 How do antibiotics work anyway? If we look at how a cell works, there are a
 152 couple of key parts the cell absolutely requires to function. The cell wall,
 153 which the cell not only needs to keep all the other parts together but also
 154 because the electrical potential between the inside and the outside is how the
 155 cell powers itself. Many antibiotics target the cell wall integrity. The group
 156 of penicillin-like antibiotics is the most widespread here. Food additives like
 157 Nisin also target the cell wall of bacteria and poke holes into it. Also, if we
 158 remember the way the cell produces proteins, pretty much every step is the
 159 target of some antibiotic. Quinolones disrupt the enzymes that unwind the DNA
 160 for replication. Antibiotics like Rifampicin target the enzyme that makes the
 161 mRNA copies. Aminoglycoside antibiotics target the ribosomes and stop them from
 162 producing proteins. Sulfonamides inhibit some proteins in central metabolism
 163 pathways. Remember, running the metabolism means living, so if the metabolism
 164 stops, the cell dies.
 165
 166 It would be very hard to come up with substances that hit all these diverse
 167 targets when starting a clean slate design. Fortunately, bacteria have been
 168 waging wars against each other for countless milennia already. All we need to
 169 do to identify new antibiotics is to screen if bacteria we have discover
 170 inhibit the growth of bacteria we want to kill. A common way to run these tests
 171 is by using a screening assay. In a screening assay you grow the target
 172 bacteria on an agar plate. On that agar plate, you put little paper discs with
 173 substances you want to test. The larger the clear inhibition zone around the
 174 paper disk, the more effective the substance you put on the paper disk is
 175 against the tested bacteria. On this picture from the US Center of Disease
 176 Control, this substance is the least effective, and this substance is the most
 177 effective.
 178
 179 This technique is a systematic repetition of Alexander Fleming's accidental
 180 discovery that a Penicillium mould would inhibit the growth of nearby
 181 Staphylococcus bacteria. Even though Fleming's discovery was over 80 years ago,
 182 systematic bioassays are still done this way. It's probably a good idea as
 183 well, considering that penicillin and related substances are some of the most
 184 versatile antibiotics known, with activity against a broad range of
 185 microorganisms.
 186
 187 If penicillins are so great, why do we need more antibiotics? Unfortunately,
 188 with the widespread use of antibiotics, we have been directing the evolution of
 189 bacteria towards antibiotic resistance. If you look at this map of europe, you
 190 see the percentage of Staphylococcus bacteria that were identified in clinics
 191 that were resistant to all penicillin-related antibiotics we know. Ranging from
 192 a really low number in scandinavia, the percentage rises the further you go
 193 south. In pretty much all of the mediterranean states, at least every fourth
 194 patient with a Staph infection can't be cured by using penicillins anymore.
 195 I didn't find nice visual data for Australia, but a 1999 ABC report cited a
 196 number betwen 20 and 40 percent of the clinical Staph isolates were resistant
 197 to penicillins. This number likely has risen in the last ten years.
 198
 199 How do bacteria get resistances in the first place? Some bacteria will always
 200 carry a mutation that makes them less suspecitble to a given antibiotic. If
 201 suddenly you speed up evolution by killing off all the more vulnerable
 202 bacteria, you're left with the resistant ones. And because they now don't have
 203 much competition for room and food, they thrive even better. In the end, the
 204 average resistance level in the population has risen. That's just what's
 205 happening in clinics all over the world since the introduction of antibiotics.
 206
 207 A really nasty feature in this respect is that bacteria are able to transfer
 208 genetic materials between different species, so even if the surviving bacteria
 209 from this example are harmless, there's a possibility that the resistance
 210 mechanisms will be transferred to a more harmful bacterium. It is believed that
 211 many of the more complex resistance mechanism have spread by such transfers
 212 from the original producer of an antibiotic. Obviously, the bacterium producing
 213 an antibiotic has to be resistant against it's own product, or it would kill
 214 itself off.
 215
 216 You can speed up this process by using sublethal doses of antibiotics, which
 217 often happens when antibiotics are misused. In countries where you can buy
 218 antibiotics off the shelf, like in the US, antibiotics misuse is widespread.
 219 For example, I was able to buy this tube of triple-antibiotic ointment at
 220 Wallmart for less than three dollars. If I misused this, I'd have a good shot
 221 at creating bacteria resistant to three different antibiotics.
 222
 223 Remember how the central dogma of molecular biology went? From DNA to mRNA to
 224 protein. Now let me show you one of the exceptions I was talking about earlier.
 225 Some bacteria and moulds have a completely different way of building proteins.
 226 There is no blueprint for the product, no mRNA involved and the ribosome never
 227 sees anything in the process. Instead, the cell builds a huge megaenzyme that
 228 works just like a factory production line. Many different modules perform a
 229 well-defined reaction. Then they get the next piece of work where they perform
 230 the exact same reaction again, rinse, repeat.
 231
 232 So why the heck does the cell bother with a whole new way of producing
 233 proteins? First of all, compared to the proteins produced by a ribosome, the
 234 factory-made proteins can contain unusual building blocks. The ribosome is a
 235 multipurpose machine that can deal with 20 amino acids without requiring any
 236 changes. A module in the production line megaenzyme is specialized on dealing
 237 with a single amino acid, but can be designed to deal with non-standard
 238 amino acids as well.
 239
 240 Also, the production line approach can produce a much higher amount of product
 241 per timeframe. While the ribosome is building up products at one step at a
 242 time, the production line performs all the steps at every cycle. So using a
 243 production line megaenzyme, the cell can pump out a lot of product really fast.
 244 Because this system allows the cell to build peptides without involving a
 245 ribosome, this is called non-ribosomal peptide synthesis. The megaenzyme is a
 246 non-ribosomal peptide synthase, or NRPS in short.
 247
 248 With the biological background part out of the way, let's talk about how I'm
 249 using antiSMASH to identify gene clusters involved in the production of
 250 antibiotics. Remember the biology part, I'll be handing out a graded test at
 251 the end of the talk. Sorry, giving talks in university lecture hall triggers
 252 teaching reflexes.
 253
 254 antiSMASH, the antibiotics and secondary metabolites analysis shell, is a
 255 modular pipeline that uses a host of exsiting bioinformatics tools to search
 256 genomes for secondary metabolite gene clusters.
 257