Why genome alignment




















And many methods disregard small, marginally significant, local alignments for the sake of speed. As a result, at a local level, the results of current WGA methods often leave room for improvement. To remedy this situation, a number of methods have been developed that may be used to refine WGAs.

They can be generally grouped into one of three categories. The first is composed of methods that refine the local structure of a WGA. A secondary category of methods focuses on optimizing individual WGA blocks with respect to an objective function.

The last category includes methods that perform alignment while taking into account the structure and evolutionary dynamics of certain classes of genomic elements. Such inversions are represented by alignments that would typically not have statistically significant scores at the genome level but can be detected via probabilistic models of local sequence evolution.

In contrast to PicoInversionMiner, which identifies novel rearrangement events, Cassis refines the coordinates of breakpoints. The refinements produced by Cassis are the result of identifying weak similarities between sequences adjacent to segments of an initial orthology map and extending the boundaries of segments based on these similarities.

The BAR algorithm of Cactus, which we have previously discussed in the context of hierarchical WGA, is also an alignment refinement method that identifies breakpoints. Other methods for refining WGAs focus on improving local colinear multiple alignments with respect to a given objective function. The PSAR-Align method [ 73 ] instead realigns blocks to optimize an expected accuracy objective function [ 74 ] using pairwise alignment probabilities estimated by the PSAR tool [ 75 ] and the sequencing annealing algorithm of the FSA multiple alignment method [ 62 ].

Lastly, a number of methods have been developed that can improve the alignments of specific classes of genomic elements, such as gene structures. The primary goal of these methods is generally to improve prediction of genomic elements, but a more accurate alignment often results as a side product. Among the oldest of such methods are comparative gene finders that perform protein-coding gene prediction and pairwise alignment simultaneously.

A related method, CESAR [ 81 ], was specifically designed for realignment and targets individual coding exons rather than full gene structures. Other methods focus on improving the alignment of noncoding regulatory regions by modeling the evolution of sets of transcription factor-binding sites with known motifs e.

Like the comparative gene finders, these methods also use statistical alignment techniques but with models extended to take into the account the conservation of binding sites instead of gene structures. Just as for small-scale alignment Chapter 7 , [ 1 ] , assessing the accuracy of WGAs is hard because we rarely know the true evolutionary history of a set of genome sequences. In fact, the evaluation of WGAs is even harder than that of protein alignments.

In addition, WGAs must be assessed not only for whether they align truly homologous sequences but also for whether they correctly predict orthologous or toporthologous relationships.

Thus, the evaluation of WGAs is related to that of gene orthology prediction, which is discussed in Chapter 9 [ 5 ]. Despite these challenges, a number of creative approaches have been used for determining the accuracy of WGA methods. The approaches generally fall into four categories: 1 simulation, 2 analysis of alignments to annotated regions, 3 comparison with predictions from other methods, and 4 alignment statistics.

Simulated data are appealing for evaluation as we know the entire evolutionary history of the simulated sequences and can thus thoroughly evaluate the accuracy of an alignment. Many of the WGA methods described in this chapter have used simulations for assessing their accuracies [ 8 , 47 , 52 , 54 , 62 ]. The Alignathon [ 87 ], one of the most comprehensive evaluations of WGA methods to date, relied heavily on simulated data sets.

This study called attention to one potential pitfall of simulation-based evaluation, which is that the performance of a WGA method may be overestimated when that method was developed or trained with respect to the same simulator used for the assessment. Simulating the evolution of whole genomes is a challenging task, and it is unclear if the current models used for simulation are close to reality.

Such models are highly complex, as they have to account for many different types of evolutionary events, at both the small and large scales. For example, they need to model the random mutations of both single-nucleotide substitutions and megabase-sized inversions. In addition, they also need to model natural selection, which alters the probability of these random mutations becoming fixed within a population. For example, an inversion that cuts an essential gene in half might have a much lower probability of becoming fixed than an inversion with both end points in intergenic regions.

Despite these challenging model details, a number of genomic evolution simulators have been developed. Currently, only three simulators model both small-scale events e. Other simulators focus only on nonrearranging events [ 8 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 ] and are thus good for evaluating colinear genomic aligners but not homology mapping methods. A second class of approaches to evaluating WGAs leverages our knowledge of various classes of elements within the genome.

Specificity can also be roughly assessed with coding regions, either by counting the number of coding bases that are aligned to noncoding bases in other genomes [ 36 , ] or by checking that alignments in coding regions exhibit periodicities in their substitution patterns [ 99 ].

A related approach that instead assesses the accuracy of eukaryotic orthology maps is to check if exons from the same gene are mapped in the same order and orientation to other genomes [ 47 ]. However, the fact that genic regions are often highly conserved is also a disadvantage of using them for evaluation; the most conserved regions are the easiest to align, and some aligners use exon annotation information or translated matches.

Because of these issues, repeat sequences, which are believed to evolve more neutrally, have been used for alignment evaluation [ 47 , 99 ]. For example, in [ 99 ], sensitivity was assessed by alignments of ancestral repetitive elements, and specificity was inferred from the number of alignments to lineage-specific repeat elements in this study, primate-specific Alu repeats. Another common evaluation technique is to compare whole-genome aligners against other related methods.

This technique is useful for judging the similarity of different WGAs but, unfortunately, does not provide much information about accuracy. Another technique is to compare with the results from gene orthology prediction programs [ 48 , 49 ]. The advantage of this approach is that it provides a more independent test of accuracy, since gene orthology prediction programs generally use different algorithms and information sources to infer orthology.

The disadvantages of this approach are that it only provides a gene-level measure of accuracy and does not evaluate alignments of noncoding regions.

In addition, since WGA and gene orthology prediction share similar goals, we might expect that future methods will blend techniques from both and thus that this evaluation approach will decrease in usefulness. A last class of evaluation techniques involves the computation of statistics for WGAs. These statistics can be subdivided into simple descriptive statistics and measures computed via statistical or sampling techniques.

Generally, the higher the coverage, the more sensitive the WGA is believed to be, although one can easily create high-coverage WGAs with poor sensitivity. As a check of large-scale specificity in mammalian WGAs, the authors of [ 47 ] checked the fraction of the X chromosome that was covered by alignments to autosomal chromosomes in other genomes the assumption being that translocations into and out of the X chromosome are rare in mammals.

More sophisticated statistics related to WGA accuracy are computed through the use of statistical or sampling techniques. Just as they are used for BLAST, Karlin and Altschul statistics [ ] may be used to assess the significance of local pairwise alignments between genomes. StatSigMA extends these statistics to multiple alignments [ ], and StatSigMA-w further extends this technique to detect dubiously aligned regions in WGAs of multiple genomes [ ].

Within a multiple alignment, a number of techniques have been developed for estimating the accuracy of the alignment of pairs of residues or entire columns, including simply computing an alignment of reversed sequences [ ], computing alignments with bootstrapped guide trees [ ], sampling suboptimal multiple alignments [ 75 ], and evaluating consistency within a library of alternative alignments [ ].

Despite the substantial progress made in WGA methodology development, there are a number of challenges that remain unsolved. First, we are in need of WGA methods that can scale to hundreds or thousands of genomes. Along with ever-improving sequencing technology, we are accumulating whole-genome sequences at an increasing rate. Projects such as the Genome 10K Community of Scientists [ ], which aims to collect and sequence the genomes of 10, vertebrate species, will further push the WGA problem to new scales.

While most WGA algorithms have been made efficient for long genomes, very few are practical for large numbers of genomes. However, methods scaling to thousands of genomes for the full WGA task or for mammalian-sized genomes do not currently exist. In addition to algorithmic advances, we will also be in need of novel approaches for storing and representing WGAs of thousands of genomes. Second, advances are needed in the parameterization of WGA methods.

Current methods are littered with large numbers of parameters that are often heuristic in nature and not easily determined. In some cases, the default parameters for a WGA method may be markedly suboptimal [ ].

One solution to this problem is to adopt probabilistic models, which offer principled approaches to parameter estimation, such as maximum likelihood. In fact, probabilistic models of sequence evolution have already been adopted for the alignment of colinear genomic segments and have been shown to offer improved accuracy [ 47 , 62 ].

In addition, most WGA alignments use models or scoring schemes that assume homogenous rates of evolution across the genome. This assumption is obviously violated in real data, and new methods will need to be developed that take this into account. Simulated noncoding genomic alignments that represent a heterogeneous mix of evolutionary rates have been developed and should be useful for the development of new WGA methodology [ 97 ].

Lastly, more attention must be paid to the fact that a WGA is typically just a single estimate of the evolutionary history of a set of genomes and portions of this estimate may be highly uncertain. Encouragingly, methods for colinear genomic alignment have brought light to this issue at the nucleotide level [ 62 , ]. However, the issue of uncertainty at the large-scale orthology map level has not been sufficiently studied, perhaps due to the lack of probabilistic models for that level of the WGA problem.

In addition, most efforts to address uncertainty in alignments simply assign levels of confidence to the components of a single alignment. It may be more useful to be presented with a set of near-optimal alignments so that alternative evolutionary histories can be examined by downstream analyses [ ].

The determination and representation of uncertainty for all scales of a WGA will likely remain a challenging problem as the number of genomes included in alignments increases. Visualize the resulting alignment with the mummerplot program. How many colinear blocks are there in the alignment?

How many inversion events are implied by the alignment? Search for and view the CFTR gene, mutations in which cause the disease cystic fibrosis. Examine the Mouse Net track in the visualization and note the color of the mouse net alignments. Looking at the net alignments for all of the placental mammals, does it appear that CFTR has been conserved across this clade?

The evolutionary scenario to be considered for Exercise 3. MUMmer4 significantly raises these size limits, making it possible to align even the largest genomes. The available computer hardware provides a more practical limit; MUMmer4 uses 15 bytes per base of the reference sequence to store the index. Thus for example, if the reference sequence were 66 Gb equal to 22 human genomes , the suffix array would require about 1Tb of computer RAM. Specifically, it loads X bases of reference sequence, matches all query sequences against the first batch, then loads the next X bases of reference and so on.

In practice this eliminates any restrictions on the size of the input reference sequence and also allows the user to tune the alignment runs to the total available computer memory. MUMmer4 has no absolute limit on the total query sequence size. Also, as a minor improvement, MUMmer4 removes the previous character limit on the length of the names of reference and query sequences.

MUMmer4 now includes options to save and load the suffix array for a given reference. The most popular systems for aligning reads to a reference genome, Bowtie2 [ 16 ] and BWA [ 4 ], both assume that their index a Burrows-Wheeler transform using the FM index has been pre-computed and stored in a file, which allows the alignment step to run much faster. Suffix array construction is primarily a single-threaded task that can take about 36 minutes for a 3 Gb genome.

Many large genomes— e. Thus there is no need to build the suffix array on the fly at the time of alignment if one intends to use the same reference repeatedly, e. Using the new MUMmer4 option, the suffix array can be built once, saved and then loaded for each run. For example, the size of the suffix array for the human genome is approximately 39 GB. Nucmer4 requires additional memory in scenarios when running with multiple threads on query sequences that are large.

Our parallelization routine distributes multiple query sequences into multiple threads, one sequence per thread, and query sequences have to be loaded into memory. The step of loading multiple query sequences into memory at the same time increases peak memory usage in such scenarios, proportional to the number of threads used.

With only one thread, memory usage is similar between nucmer3 and nucmer4. The original output format of nucmer , the delta format, contains only the minimum information necessary to quickly recreate the alignment.

It contains the name of the matching sequences, the length of the match, number of errors and positions of indels. Nucmer4 supports two different options for SAM format output. With --sam-short , nucmer4 reports only the name of the matching sequence, length, and CIGAR string which reports the indel positions. With --sam-long , it additionally reports the MD string which specifies the mismatching positions , the sequence and, if applicable, the quality values of the matching sequence.

The long format is more expensive to compute and it generates larger output files, but this option allows nucmer4 to match the behavior of other aligners such as Bowtie2 or BWA.

We transformed the global variables in the original code to object instance variables. As a result, it is possible for an application using libumdmummer to instantiate multiple aligner objects concurrently, for example in a multi-threaded program.

We used the SWIG [ 17 ] tool to generate the script bindings, allowing developers to create bindings to the many other languages supported by SWIG with little extra work. This binding allows a user to align a pair of sequences directly from the scripting languages, returning an array of the alignments.

The enhancements in MUMmer4 allow it to align 1 a pair of genomes to each other, 2 two large sets of sequences to one another, or 3 a set of reads to a reference genome. Although specialized read aligners are more accurate and sensitive and thus likely preferable for that task, we show below that when run with default settings, the speed of MUMmer4 is comparable to Bowtie2, BWA and BLASR for the alignment of both short low-error-rate Illumina and long high-error-read PacBio sequences to a reference genome.

Table 1 shows detailed feature comparison between MUMmer4 and the other aligners. A checkmark means the feature is present and usable, otherwise the feature is absent or its use is impractical. Features that are absent by design are marked with a dash. We also show substantial improvements in speed and versatility compared to the MUMmer3 package. In the Supplementary material, we report all settings and command line parameters used for generating the results shown here. For the comparisons shown here, we used two different organisms as reference genomes.

For mapping reads to a genome, we used the Arabidopsis thaliana Col-0 reference genome [ 18 ] and the human reference genome, version GRCh We removed all alt sequences from the human reference sequence, because they could skew our human—chimpanzee genome to genome comparison statistics.

Detailed information about these data sets is shown in Table 2. The Illumina and PacBio data for A. The reference genomes are the Arabidopsis thaliana Col-0 reference genome [ 18 ], the human reference genome version GRCh The primary usage scenario for nucmer3 was to align two genome assemblies or two reference genomes. In this section we demonstrate improvements in timings for such alignments due to parallelization in nucmer4.

We use several pairs of plant and animal genomes. Table 3 summarizes the timings and memory usage for the alignments that we ran.

Nucmer3 was unable to align the chimp reference to the human reference due to limitation in the size of the reference sequence max Mbp. Nucmer4 peak memory usage is higher both due to its bit index, and due to loading 32 large query sequences at a time for parallel processing, but it runs significantly faster than Nucmer3. Below we provide details on the nucmer4 alignments. We list both wall clock time and CPU time to show how effective is the code in utilizing multiple cores.

Nucmer 4 is the fastest, but not the most memory efficient aligner. Nucmer3 failed to align human to chimp assembly due to the restriction on the size of the reference sequence.

LASTZ and Mauve runs on human to chimp alignments took over two days, and we stopped them after that. LASTZ defaults are optimized for high sensitivity, resulting in slow performance. Thus for fairness of timing comparisons we ran LASTZ twice: once with default settings and once with parameters that result in sensitivity matching that of nucmer4 with default settings. We list the parameters in the supplement. First we aligned the current assemblies of human and chimpanzee, using the default nucmer4 options with 32 parallel threads.

Note that Nucmer3 cannot perform an alignment this large unless one first breaks both genomes into smaller pieces. We used human as the reference and chimpanzee as the query sequence. The human GRCh38 assembly contains 3. Note that the chimpanzee genome is far less polished than human, and much of the extra DNA might be explained by haplotype variants or incompletely merged regions; thus the two genomes might be much closer in size than these numbers indicate.

MUMmer had 2. The 1. Similarly, a translocation would align as a second level net. However, the requirement that a locus in the reference can overlap at most one net implies that nets cannot represent duplications in the query genome Kent et al.

Chain-breaking alignments CBAs in mammalian genome alignments. A A genomic inversion results in two overlapping chains. Two CBAs highlighted in red break the lower level chain, representing a single inversion event, into three separate nets, which would imply that three inversion events happened.

Removing both CBAs results in a single net that correctly indicates a single inversion. B A chain-breaking alignment in the top-level chr3 chain breaks the lower level chr5 chain, representing a 4. Removing this chain-breaking alignment results in a single net, which spans the full BTBD8 and its neighboring genes. The red block aligns a retroposed GAPDH pseudogene that likely was inserted independently into this locus in both human and mouse.

The orthologous alignments of two NCOA7 exons are masked by these pseudogene alignments, which harbor numerous gene-inactivating mutations Supplementary Figure 2.

Removing the CBAs would keep the entire lower level chain as one syntenic net. A key feature of the net-building algorithm is that it takes all aligning blocks of the top-scoring chain as the top-level net. This implies that if the top-level chain contains, for example, non-orthologous alignments between the reference and the query genome, these alignments will become aligning blocks in the top-level net. Since nested lower-scoring chains can only fill gaps in a higher-scoring net, the nested chain could be broken into a number of smaller nets Fig.

This can lead to situations where nets do not represent the correct rearrangement history, for example, by inflating the number of rearrangements that occurred Fig. We define two types of CBAs. True CBAs need to be removed from the breaking chain in order to result in nets representing the correct rearrangement history with the limitation that nets cannot represent duplications in the query genome. These removed alignments then form a new chain and can become a new nested net.

In contrast, false CBAs should not be removed from the breaking chain, because this chain results in a net that already represents the correct rearrangement history. Terminology and illustration of the score ratio. B Illustration of the ratio between the minimum score of the upstream and downstream broken chain parts and the score of the CBA. Apart from obscuring the rearrangement history, true CBAs can have other undesirable consequences.

First, true CBAs can mask alignments between exons of orthologous genes, for example, if the breaking chain contains alignments to a processed pseudogene. In the case shown in Figure 1C , the pseudogene alignments reveal numerous gene-inactivating mutations, from which one would incorrectly infer gene loss Supplementary Fig.

Second, since low scoring nets are less likely to represent an orthologous alignment, one often filters out nets with a score below a minimum threshold Kent et al. Consequently, if the broken chain is broken into a number of smaller nets, some of these individual nets can fall below the score threshold and would be incorrectly filtered out.

This is shown in Figure 1C , where several orthologous aligning blocks are missed in the final genome alignment. Together, true CBAs impair both the specificity and sensitivity of genome alignments.

Given that the accuracy of genome alignments is crucial for comparative genomics, we developed a fast method, called chainCleaner, to detect and remove true CBAs from the breaking chains. We systematically tested this method on vertebrate genome alignments at a variety of evolutionary distances and show that chainCleaner improves the alignment of many orthologous genes and rescues nets that would otherwise be incorrectly filtered out.

Our method chainCleaner takes a set of alignment chains as input and removes CBAs from these chains. The rationale of chainCleaner is the following: if the part of the broken chain that surrounds the CBA represents an orthologous alignment and a single rearrangement, then the local score of the broken chain should be higher than the score of the CBA Fig. This is in contrast to the scores of the entire chains, where, by definition, the breaking chain scores higher than the broken chain.

Using the chain-scoring scheme developed in Kent et al. Given the set of chains, chainCleaner computes the score ratio for every observed CBA and removes those CBAs where the score ratio is above a certain threshold. First, chainCleaner nets the chains using chainNet and removes all individual nets with a score lower than These objects store the identifiers, pointers to the breaking and broken chain, the coordinates of the CBA and the coordinates of non-aligning regions gaps upstream and downstream of the CBA.

The latter coordinates correspond to the two regions where parts of the broken chain fill the gaps in the net that corresponds to the breaking chain. To assure that the broken chain likely represents an orthologous alignment, chainCleaner only considers broken chains with a score higher than 50 For mammalian alignments, where chain scores are generally higher, we used 75 as a threshold parameter -minBrokenChainScore 75 This scoring scheme iterates over all aligning blocks and adds the scores of all local ungapped alignments and a cost that penalizes the gap between two adjacent blocks depending on the gap size in the reference and query assembly Kent et al.

We noticed that a CBA can comprise several aligning blocks spread over a larger region. The score of the CBA can then be negative, for example, if the CBA comprises one solid and one weak aligning block that are separated by a large distance.

To avoid underestimating the score of the CBA, we scored CBAs with a modified scoring scheme that is analogous to a local alignment score. This modified scheme also iterates over all aligning blocks and but records the maximum and sets the score to 0 if it falls below 0. Then, we compute the ratio between the minimum score of the upstream and downstream broken chain parts and the score of the CBA.

If this score ratio is above a user-given threshold 2. By default, chainCleaner does not consider CBAs that score higher than This new chain can become a new net if it fills a gap and is above a minimum score threshold. Since a breaking chain can have more than one CBA in close proximity, chainCleaner updates the size of the upstream and downstream gap in the breakInfo structures and iteratively tests if further CBAs should be removed.

In addition, chainCleaner also tests if a pair of CBAs should be removed together parameter -doPairs. Considering pairs allows removing CBAs that are very close to each other, in which case the score of the upstream or downstream part of the broken chain would not be very high Supplementary Fig. We recompute the chain score for all breaking chains where CBAs have been removed. The output of chainCleaner is a cleaned and score-sorted chain file, and a file in bed format that lists the coordinates and information of each removed CBA.

For testing chainCleaner on independent species, we used rat rn6 , guinea pig cavPor3 , rabbit oryCun2 , dog canFam3 , Tasmanian devil sarHar1 , zebra finch taeGut2 , duck anaPla1 , Chinese softshell turtle pelSin1 , fugu fr3 and medaka oryLat2. One-to-one orthologs were downloaded from Ensembl Biomart Kinsella et al.

Then, we tested for all aligning blocks in all chains if a block aligns an exon of a human gene to its ortholog in the query species. For each human exon for which this was the case, we obtained the coordinates and the chain identifier.

We used chainCleaner with parameter — suspectDataFile to obtain the coordinates and score ratios of all chain-breaking alignments, without removing any of them and without considering pairs of CBAs. Then we overlapped all CBAs with coordinates of the genic regions defined as the region between the first and last coding exon with an orthologous alignment for this gene and the coordinates of the exons that align to the ortholog.

A false CBA overlaps an alignment between exons of orthologous genes and breaks a lower-level chain that is unlikely to represent an orthologous alignment Fig. Exons that align between orthologous genes are used to obtain a training set of true and false chain-breaking alignments.

A Illustration of exons that align between orthologous genes: Genes with the same color are orthologs.

Only coding exons are considered. The top-level chain aligns the three exons of the red gene to its ortholog, however, this chain also aligns exons 2 and 3 of the blue gene to a potential paralog purple. The lower level chain aligns exons 1 and 3 of the blue gene to its ortholog. Note that exon 2 of the blue gene and exon 1 of the yellow gene align, but neither of them align to the ortholog.

B A CBA that is located between the first and the last exon of a gene and breaks the chain that represents the orthologous alignment lower level chain here is considered to be a true CBA. Before removing this CBA, the orthologous exon alignments of the blue gene are located on two separate nets.

After removing this CBA, all orthologous exon alignments are located on a single net, which increases the maximum number of aligning exonic bases alignment coverage observed for a single net.

C An alignment between an exon of an orthologous gene that breaks a lower level chain is considered to be a false CBA. In this case, the top-level chain represents the ortholog alignment. Removing this CBA decreases the alignment coverage.

The human hg38 genome assembly was used as the reference genome. We integrated running chainCleaner and highly sensitive local alignments Hiller et al. For non-placental mammals, we also used highly sensitive local alignments Hiller et al.

Microbat Myotis lucifugus 1 alignment Human Homo sapiens. Arabian camel Camelus dromedarius 2 alignments 2 syntenies Human Homo sapiens.

Alpaca Vicugna pacos 2 alignments Human Homo sapiens. Dolphin Tursiops truncatus 2 alignments Human Homo sapiens. Beluga whale Delphinapterus leucas 2 alignments Human Homo sapiens.

Narwhal Monodon monoceros 2 alignments Human Homo sapiens. Sperm whale Physeter catodon 2 alignments 2 syntenies Human Homo sapiens. Vaquita Phocoena sinus 2 alignments 2 syntenies Human Homo sapiens. Blue whale Balaenoptera musculus 2 alignments 2 syntenies Human Homo sapiens. Pig - Hampshire Sus scrofa 1 alignment Pig Sus scrofa. Pig - Largewhite Sus scrofa 1 alignment Pig Sus scrofa.

Pig - Wuzhishan Sus scrofa 1 alignment Pig Sus scrofa. Pig - Berkshire Sus scrofa 1 alignment Pig Sus scrofa. Pig - Landrace Sus scrofa 1 alignment Pig Sus scrofa. Pig - Rongchang Sus scrofa 1 alignment Pig Sus scrofa. Pig Sus scrofa 17 alignments 5 syntenies Human Homo sapiens. Pig - Bamei Sus scrofa 1 alignment Pig Sus scrofa. Pig - Tibetan Sus scrofa 1 alignment Pig Sus scrofa.

Pig - Pietrain Sus scrofa 1 alignment Pig Sus scrofa. Pig - Meishan Sus scrofa 1 alignment Pig Sus scrofa.

Pig - Jinhua Sus scrofa 1 alignment Pig Sus scrofa. Chacoan peccary Catagonus wagneri 3 alignments Human Homo sapiens.

Goat Capra hircus 2 alignments 2 syntenies Human Homo sapiens. Sheep Ovis aries 2 alignments 2 syntenies Human Homo sapiens. American bison Bison bison bison 2 alignments Human Homo sapiens.

Cow Bos taurus 22 alignments 14 syntenies Human Homo sapiens. Domestic yak Bos grunniens 2 alignments 2 syntenies Human Homo sapiens. Wild yak Bos mutus 2 alignments Human Homo sapiens. Yarkand deer Cervus hanglu yarkandensis 2 alignments 2 syntenies Human Homo sapiens.

Siberian musk deer Moschus moschiferus 2 alignments Human Homo sapiens. Shrew Sorex araneus 1 alignment Human Homo sapiens. Hedgehog Erinaceus europaeus 1 alignment Human Homo sapiens.

Horse Equus caballus 5 alignments 4 syntenies Human Homo sapiens. Donkey Equus asinus asinus 2 alignments Human Homo sapiens. Polar bear Ursus maritimus 2 alignments Human Homo sapiens. American black bear Ursus americanus 2 alignments Human Homo sapiens.

Giant panda Ailuropoda melanoleuca 1 alignment 1 synteny Dog Canis lupus familiaris. Dog Canis lupus familiaris 15 alignments 7 syntenies Human Homo sapiens. Dingo Canis lupus dingo 2 alignments Human Homo sapiens. Red fox Vulpes vulpes 2 alignments Human Homo sapiens. American mink Neovison vison 2 alignments Human Homo sapiens. Ferret Mustela putorius furo 2 alignments Human Homo sapiens. Cat Felis catus 5 alignments 3 syntenies Human Homo sapiens. Leopard Panthera pardus 3 alignments Human Homo sapiens.

Lion Panthera leo 3 alignments 3 syntenies Human Homo sapiens. Tiger Panthera tigris altaica 3 alignments Human Homo sapiens. Elephant Loxodonta africana 1 alignment Human Homo sapiens. Lesser hedgehog tenrec Echinops telfairi 1 alignment Human Homo sapiens. Hyrax Procavia capensis 1 alignment Human Homo sapiens. Sloth Choloepus hoffmanni 1 alignment Human Homo sapiens. Armadillo Dasypus novemcinctus 1 alignment Human Homo sapiens.

Opossum Monodelphis domestica 7 alignments 4 syntenies Human Homo sapiens. Tasmanian devil Sarcophilus harrisii 1 alignment 1 synteny Opossum Monodelphis domestica.



0コメント

  • 1000 / 1000