Problems and their Current Status

Number

Type of Training Examples

Properties of Language

Log 2 (Complexity)

Training set

Testing sets

1

Positive & negative

Not regular, deterministic

1.10E+09

train.1.gz

test.1.gz (solved)

test.1large.gz

2

Positive only

Not regular, deterministic

7.12E+08

train.2.gz

test.2.gz (solved)

test.2large.gz (solved)

3

Positive & negative

Not regular, deterministic

1.65E+10

train.3.gz

test.3.gz (solved)

test.3large.gz

4

Positive only

Not regular, deterministic

1.13E+10

train.4.gz

test.4.gz (solved)

test.4large.gz (solved)

5

Positive & negative

Not regular, nondeterministic

5.46E+10

train.5.gz

test.5.gz (solved)

test.5large.gz

6

Positive only

Not regular, nondeterministic

6.55E+10

train.6.gz

test.6.gz (solved)

test.6large.gz

7

Positive & negative

Not regular, deterministic

5.88E+11

train.7.gz


test.7large.gz

8

Positive only

Not regular, deterministic

1.63E+11

train.8.gz


test.8large.gz

9

Positive & negative

Not regular, nondeterministic

1.08E+12

train.9.gz


test.9large.gz

10

Positive only

Not regular, nondeterministic

9.92E+11

train.10.gz


test.10large.gz

These files represent 10 different problems, ranked in difficulty from 1 to 10. The way in which the target problems, training and testing sets were created is defined here .

The winner of each individual problem will be the first contestant that submits a correctly label test set for that problem to the oracle. The winner of the competition will be the winner of the highest-ranking problem on its large test set (Yes, test.1large dominates test.6). We reserve the right to add additional problems to the competition if the initial set of problems proves to be too simple.

File Formats

The files train.?.gz are sample strings labelled by the sixteen languages in the competition. You should use them to infer the languages. You can test your answers using test.?.gz , which are strings you can classify and then test using the Omphalos Oracle .

The above individual files are compressed with gzip . Alternatively, you can get the files in one shot ( data-sets-ALL.tgz ), Note that you can force Netscape to download to a file instead of displaying by using shift-left Click on the link.

In each file, the first line is a header giving the number of strings in the file and the number of symbols (for this competition always two: 0 and 1 ). Each succeeding line specifies one string. These lines have the format "label len sym 1 sym 2 ... sym len " where len is the length of the string, and sym 1 sym 2 ... sym len are its symbols, separated by white space. The label 1 means accepted, the label 0 means rejected, and the label -1 means unknown (used in testing set files). So if the last line of a file was 0 7 1 0 0 0 1 1 1 it would indicate that the string 1000111 is rejected.



Omphalos is being organized and administered by:

Brad Starkie , François Coste and Menno van Zaanen

You can contact them for comments and complaints at omphalos@irisa.fr


Valid CSS!Valid HTML 4.01!