GeoAssign

The Web based program assigns test individuals according to their genotypes or metric traits to groups (e.g. country of origin, population of origin) which are defined by reference individuals. Thus the user needs to upload one data file for the test individuals (“Test file”) and one data file for the reference individuals (“Reference file”). The assignment is done using a nearest neighbour approach (Degen et al. 2017, doi: http://dx.doi.org/10.1016/j.fsigen.2016.12.011).

Structure of the user interface

The user interface is subdivided in three parts:

  1. Upload area for input files
    Running the GeoAssign server requires two files: one file (in the following called "test file") that contains data of samples whose origin has to be verified /assigned. The other file (in the following called "reference file") contains the reference samples which are already grouped into populations. You can upload your test and reference file by clicking the corresponding buttons in the upload area. The file format is described in the following section. As a third step, the input data type has to be selected, you can choose between genetic data and metric data like isotopic ratio data.
  2. Area for calculation details
    Here, you can adjust the calculation details.
    Percentile of most similar individuals (0.1 - 20%): This parameter controls the number of nearest neighbours (reference individuals with small genetic/ metric distance between test individual and reference individual). It is a percentile of all reference individuals ordered according to their genetic / metric distance to the test individual.
    Minimum proportion of complete loci / traits (0.1 - 1): Often the data of genotypes or metric traits of the test individuals and reference individuals are incomplete. Missing data are coded by “-1”. This parameter determines which proportion of missing data can be tolerated. E.g. if you select as parameter “0.8” only test individuals and reference individuals will be included in the calculations when they have data for at least 80% of their genotype or metric traits.
  3. Output file details area
    The GeoAssign program has two output modi: in the default case, it only generates summary data on the asignment. If you uncheck the "Print only summary files", on additional file for each test sample wil be generated. For more information about the output files, go to the output file section.
    The "Prefix for output files" field gives the user the opportunity to choose a prefix for all output files.
    After completion of the calculation you get an email containing a download link for your results. Therefore, it is necessary to specify an email address.

Input data files

The input files are simple ASCII files. They can be created with any commonly used editor. (Note: The program requires points "." as separator of decimal places (e.g. "125.12"). If you are not using the UK or US regional configuration make sure that your operating system support this setting.) The file content consits of a header part and a data part. The header part provides a small amount of numbers needed for the assignment calculation. The data part is line based, each line represents one sample. Variables are separated by “,”. Missing data are indicated by “-1”. The format for the reference and the test file is identical and explained in the following figures:

Input file example
Figure 1: Example input file with genetic data for diploid species. Note: Haploid data can be entered as homozygotes.
Input file example
Figure 2: Example input file with quantitative traits.

Note: For a successfull GeoAssign calculation, it is essential that the header and the data header line of test and reference file are identical!

Explanatory notes on the different fields:

Output files

The programme computes four different types of output files. The file names are composed by the user defined prefix and a suffix specific for each file type:

Name Content
prefix_Index.csv
Toggle details
For each tested individual the values of the assignment indices in the different reference groups. The index represents the proportion of the nearest neighbours weighted by size of the reference group.
prefix_Ex_Prob.csv
Toggle details
For each tested individual the values of the exclusion probabilities in the different reference groups. As an indicator of exclusion (outlier) the programme computes the relative frequency of the smaller genetic/metric distances among pairs of individuals in each reference group compared to the distances among test and reference individual)
prefix_Ind_Num.txt
Toggle details
An ASCII text file for each tested individual “Ind_Num” with the genetic/metric distance among the test individual and all reference individuals ordered from the smallest to the largest distance.
prefix_Indentical_Genotypes.csv This file is only optionally generated for genetic data. It includes a list of reference individuals with identical genotypes

Further questions

If you have further questions or comments on the GeoAssign web server, please contact us.