Skip to main content

Software and supporting material for: "SmileFinder: a resampling-based platform to evaluate signatures of selection from genome-wide sets of matching allele frequency data between populations".

Dataset type: Software, Workflow
Data released on November 05, 2014

(2014): Software and supporting material for: "SmileFinder: a resampling-based platform to evaluate signatures of selection from genome-wide sets of matching allele frequency data between populations". GigaScience. https://doi.org/10.5524/100100

DOI10.5524/100100

SmileFinder is a simple program that looks for the sweep patterns left by historic selection in genome-wide allele frequency datasets by evaluating the diversity and difference between two or more populations of diploid species against the neutral expectation. The program calculates the mean of heterozygosity and of variance in fixation index (FST) in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1K to 100M times; the higher the number the more precise the percentiles ascribed to the extreme observed values.
The output from SmileFinder can be used to plot percentile values to look for signatures of selection along chromosome maps, or to compare lists of candidate genes to random gene sets to test for the over-representation of sweeps. Both uses of the algorithms have already been implemented in published studies. This publicly available, open source program should become a useful tool for preliminary scans of selection using worldwide databases of human genetic variation as well as population datasets for many non-human species from which such data is rapidly emerging with the advent of the new genotyping and sequencing technologies.

A Galaxy workflow showing how Smilefinder can be used is available from GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/g/wguiblet2014).

View citations on Google ScholarView citations on Europe PubMed CentralView citations on Dimensions

Additional details

Read the peer-reviewed publication(s):

  • Guiblet, W. M., Zhao, K., O’Brien, S. J., Massey, S. E., Roca, A. L., & Oleksyk, T. K. (2015). SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations. GigaScience, 4(1). https://doi.org/10.1186/2047-217x-4-1 (PubMed:25838885)

Github links:

https://github.com/wilfriedguiblet/smilefinder

Click on a table column to sort the results.

Table Settings

File Name Description Sample ID Data Type File Format Size Release Date File Attributes Download
Software Python 4.09 kB 2014-07-23 MD5 checksum: 94930842624f312402c77b4c6141a976
Text TEXT 2.08 GB 2014-07-23 MD5 checksum: d683ded198e010fd6a13ba2f55ac1df6
Text TEXT 15.07 MB 2014-07-23 MD5 checksum: b4f2913dfc3e3adb5a6e76db78c7ec43
Text TEXT 882 B 2014-07-23 MD5 checksum: 82b91067e872de5242b8e86db13f47ed
The results generated from Count.py Allele frequencies TEXT 81.95 MB 2014-07-23 MD5 checksum: cf39bb9942249e91b1fddc244d1ecc54
Information on using Smilefinder. Text TEXT 3.29 kB 2014-07-23 MD5 checksum: 0f6680ccbe424db18763a01994b91c39
Annotation TEXT 112 B 2014-07-23 MD5 checksum: 0216dfe2c9aed88fe47e58b81a414280
Other TEXT 47.13 MB 2014-07-23
Software Python 10.76 kB 2014-07-23 MD5 checksum: a64de1b5137d8b9c5c8be7a637238f66
Image PNG 63.40 kB 2014-10-20 MD5 checksum: 18c05f3c50828b5030e5ea9b50a1778c
Date Action
August 18, 2020 External Link added : https://github.com/wilfriedguiblet/smilefinder
August 18, 2020 Link removed : GitHub:wilfriedguiblet/smilefinder
June 21, 2021 Manuscript Link updated : 10.1186/2047-217X-4-1
April 26, 2024 Description updated from : SmileFinder is a simple program that looks for the sweep patterns left by historic selection in genome-wide allele frequency datasets by evaluating the diversity and difference between two or more populations of diploid species against the neutral expectation. The program calculates the mean of heterozygosity and of variance in fixation index (FST) in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1K to 100M times; the higher the number the more precise the percentiles ascribed to the extreme observed values.
The output from SmileFinder can be used to plot percentile values to look for signatures of selection along chromosome maps, or to compare lists of candidate genes to random gene sets to test for the overrepresentation of sweeps. Both uses of the algorithms have already been implemented in published studies. This publicly available, open source program should become a useful tool for preliminary scans of selection using worldwide databases of human genetic variation as well as population datasets for many non-human species from which such data is rapidly emerging with the advent of the new genotyping and sequencing technologies. A Galaxy workflow showing how Smilefinder can be used is available from GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/g/wguiblet2014).