Software and supporting material for: "SmileFinder: a resampling-based platform to evaluate signatures of selection from genome-wide sets of matching allele frequency data between populations".

Dataset type: Software, Workflow
Data released on November 05, 2014

(2014): Software and supporting material for: "SmileFinder: a resampling-based platform to evaluate signatures of selection from genome-wide sets of matching allele frequency data between populations". GigaScience. https://doi.org/10.5524/100100

DOI10.5524/100100

SmileFinder is a simple program that looks for the sweep patterns left by historic selection in genome-wide allele frequency datasets by evaluating the diversity and difference between two or more populations of diploid species against the neutral expectation. The program calculates the mean of heterozygosity and of variance in fixation index (F_ST) in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1K to 100M times; the higher the number the more precise the percentiles ascribed to the extreme observed values.
The output from SmileFinder can be used to plot percentile values to look for signatures of selection along chromosome maps, or to compare lists of candidate genes to random gene sets to test for the over-representation of sweeps. Both uses of the algorithms have already been implemented in published studies. This publicly available, open source program should become a useful tool for preliminary scans of selection using worldwide databases of human genetic variation as well as population datasets for many non-human species from which such data is rapidly emerging with the advent of the new genotyping and sequencing technologies.

A Galaxy workflow showing how Smilefinder can be used is available from GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/g/wguiblet2014).

Additional details

Read the peer-reviewed publication(s):

Guiblet, W. M., Zhao, K., O’Brien, S. J., Massey, S. E., Roca, A. L., & Oleksyk, T. K. (2015). SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations. GigaScience, 4(1). https://doi.org/10.1186/2047-217x-4-1 (PubMed:25838885)

Github links:

https://github.com/wilfriedguiblet/smilefinder

Files
History

Click on a table column to sort the results.

Table Settings

File Name	Description	Data Type	File Format	Size	Release Date	File Attributes
count.py		Software	Python	4.09 kB	2014-07-23	MD5 checksum: 94930842624f312402c77b4c6141a976
HGDP_FinalReport_Forward.txt		Text	TEXT	2.08 GB	2014-07-23	MD5 checksum: d683ded198e010fd6a13ba2f55ac1df6
HGDP_Map.txt		Text	TEXT	15.07 MB	2014-07-23	MD5 checksum: b4f2913dfc3e3adb5a6e76db78c7ec43
hgdp-popfile.txt		Text	TEXT	882 B	2014-07-23	MD5 checksum: 82b91067e872de5242b8e86db13f47ed
output.tab	The results generated from Count.py	Allele frequencies	TEXT	81.95 MB	2014-07-23	MD5 checksum: cf39bb9942249e91b1fddc244d1ecc54
README.txt	Information on using Smilefinder.	Text	TEXT	3.29 kB	2014-07-23	MD5 checksum: 0f6680ccbe424db18763a01994b91c39
report.tab		Annotation	TEXT	112 B	2014-07-23	MD5 checksum: 0216dfe2c9aed88fe47e58b81a414280
SmileFinderCompleteTable.tab		Other	TEXT	47.13 MB	2014-07-23
smilefinder.py		Software	Python	10.76 kB	2014-07-23	MD5 checksum: a64de1b5137d8b9c5c8be7a637238f66
CUL5.png		Image	PNG	63.40 kB	2014-10-20	MD5 checksum: 18c05f3c50828b5030e5ea9b50a1778c

of 2

Displaying 10 files of 11

Date	Action
August 18, 2020	External Link added : https://github.com/wilfriedguiblet/smilefinder
August 18, 2020	Link removed : GitHub:wilfriedguiblet/smilefinder
June 21, 2021	Manuscript Link updated : 10.1186/2047-217X-4-1
April 26, 2024	Description updated from : SmileFinder is a simple program that looks for the sweep patterns left by historic selection in genome-wide allele frequency datasets by evaluating the diversity and difference between two or more populations of diploid species against the neutral expectation. The program calculates the mean of heterozygosity and of variance in fixation index (F_ST) in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1K to 100M times; the higher the number the more precise the percentiles ascribed to the extreme observed values. The output from SmileFinder can be used to plot percentile values to look for signatures of selection along chromosome maps, or to compare lists of candidate genes to random gene sets to test for the overrepresentation of sweeps. Both uses of the algorithms have already been implemented in published studies. This publicly available, open source program should become a useful tool for preliminary scans of selection using worldwide databases of human genetic variation as well as population datasets for many non-human species from which such data is rapidly emerging with the advent of the new genotyping and sequencing technologies. A Galaxy workflow showing how Smilefinder can be used is available from GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/g/wguiblet2014).