Technological advances have allowed the use of DNA sequencing as a

Technological advances have allowed the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding. “genome arithmetic” tasks. The individual tools in the BEDTools suite are each focused on a relatively simple operation such as those illustrated in Physique 1. The goals of this unit are to introduce the basic concepts of genome arithmetic with BEDTools and to show via biologically relevant illustrations how analytical power is certainly conferred through smart combinations of specific BEDTools functions. This unit is supposed to give brand-new users a feeling of what’s possible using the BEDTools collection. I motivate the audience to subsequently browse the BEDTools documents SB590885 (bedtools.readthedocs.org) since just probably the most widely useful subset from the almost forty individual functions is covered. Body 1 Types of genome arithmetic functions STRATEGIC PLANNING Conclusion of the protocols protected will require a pc with an UNIX Linux or Apple Operating-system X operating-system. Microsoft Home windows users could also complete the unit if they first install Cygwin but Windows usage is not directly supported. In the following sections I will describe how to install BEDTools and other required software as well as provide an overview of basic usage concepts. Conventions Throughout this unit I will demonstrate BEDTools usage via commands issued around the UNIX command line. Such commands will use a different SB590885 font and appear in strong. Also the “$” character is merely intended to represent the command prompt and should not be typed. $ bedtools –help command is the most widely-used power in the BEDTools suite. By default reports the subset of intervals that are common to your two files. The “A” file is considered the “query” file whereas the “B” file is considered the “database” file. To demonstrate the basic functionality of the power we will use the BED files we downloaded in the Strategic Planning section to identify CpG islands that overlap exons in the human genome. Necessary Resources See Support Protocol 1 1 Display the first five BED intervals reflecting CpG islands. $ head -n 5 cpg.bed chr1 28735 29810 CpG:_116 chr1 135124 135563 CpG:_30 chr1 327790 328229 CpG:_29 chr1 437151 438164 CpG:_84 chr1 449273 450544 CpG:_99 tool reports reflects the subset of 50 base pairs that actually overlapped an exon. Rather than report solely the intersecting intervals it is often desirable to instead report the original intervals that intersected from both files. For each intersection between the two input files the “write A” and “write B” options (-wa and -wb) report the original interval from the “A” and the “B” file respectively. 3 Alternative: show overlaps with both CpG and exon coordinates (-wa -wb). $ bedtools intersect -a cpg.bed -b exons.bed -wa -wb | head -n 5 chr1 28735 29810 CpG:_116 chr1 29320 29370 NR_024540_exon10 chr1 135124 135563 CpG:_30 chr1 134772 139696 NR_039983_exon0 chr1 327790 328229 CpG:_29 SB590885 chr1 324438 328581 NR_028322_exon2 chr1 327790 328229 SB590885 CpG:_29 chr1 324438 328581 NR_028325_exon2 chr1 327790 328229 CpG:_29 chr1 327035 328581 NR_028327_exon3 the number of intervals that intersect each “query” interval. 3 Choice: present the of exons that overlap CpG islands (-c). SB590885 $ bedtools Mmp28 intersect -a cpg.bed -b exons.bed -c | mind -n 5 chr1 28735 29810 CpG:_116 1 chr1 135124 135563 CpG:_30 1 chr1 327790 328229 CpG:_29 3 chr1 437151 438164 CpG:_84 0 chr1 449273 450544 CpG:_99 0 overlap exons. 3 Choice: present those CpG islands that overlap exons (-v). $ bedtools intersect -a cpg.bed -b exons.bed -v | mind -n 5 chr1 437151 438164 CpG:_84 chr1 449273 450544 CpG:_99 chr1 533219 534114 CpG:_94 chr1 544738 546649 CpG:_171 chr1 801975 802338 CpG:_24 bottom couple of overlap to become reported as result. There are lots of cases where in fact the biological question accessible demands stricter criteria nevertheless. For instance if you are interested in learning exons which have a job in transcript legislation one could start by utilizing the -f 0.5 substitute for identify CpG islands where.