This commit is contained in:
yekhanin 2021-05-03 17:37:06 -07:00 коммит произвёл GitHub
Родитель ef963805c4
Коммит b7b1d757ac
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 1 добавлений и 1 удалений

Просмотреть файл

@ -8,7 +8,7 @@ Proceedings of the International Symposium on Information Theory (ISIT), 2021. [
Our hope is that this dataset will enable further research progress in the area of *trace reconstruction* and DNA data storage by allowing objective comparison between various algorithms. The dataset is represented by two files:
- **Centers.txt** This files contains 10,000 random strings of length 110 in the alphabet {A,C,G,T}.
- **Centers.txt** This files contains 10,000 strings of length 110 in the alphabet {A,C,G,T} generated uniformly at random.
- **Clusters.txt** This file contains 269,709 noisy nanopore reads of DNA sequences corresponding to strings in the file **Centers.txt**. Reads are arranged into clusters separated by lines of multiple "=" signs. Clusters follow the same order as the strings in the file **Centers.txt**, i.e., the first cluster contains reads corresponding to the DNA sequence represented by first string in **Centers.txt**, the second cluster contains reads corresponding to the DNA sequence represented by the second string in **Centers.txt**, etc. Note that some of the clusters might be empty, i.e., there are no reads corresponding to some strings in **Centers.txt**.