324x Filetype PPTX File size 1.14 MB Source: bioresearch.byu.edu
Next-Generation Sequencing
y
r
o
t
a
r
o
b
ay
Lt
i
ss
er
ce
nv
ei
in
cU
S
lg
an
nu
oo
iY
t
a
tm
ua
ph
mg
oi
r
CB
Problem Statement
y
r
o
t
a
r
o •Map next-generation sequence reads
b
a y
L t
i with variable nucleotide confidence to
s s
e r
c e
n v
e i a model reference genome that may
i n
c U
S
l g be different from the subject genome.
a n
n u
o o
i Y
t ▫
a Speed
t m
u a
p h
m g Tens of millions of reads to a 3Gbp
o i
r
C B genome
▫Accuracy
Mismatches included?
Repetitive regions
▫Visualization
Workflow
y
r
o
t
a
r
o
b
ay
Lt
i
ss
er
ce
nv
ei
in
cU
S
lg
an
nu
oo
iY
t
a
tm
ua
ph
mg
oi
r
CB
Indexing the genome
y
r
o
t
a
r •Fast lookup of possible hit locations for
o
b
ay
Lt
i the reads
ss
er
ce
nv ▫Hashing groups locations in the genome
ei
in
cU
S that have similar sequence content
lg
an
nu
oo k-mer hash of exact matches in genome can
iY
t
a
tm be used to narrow down possible match
ua
ph
mg
oi locations for reads
r
CB ▫Sorting genome locations provides for
content addressing of genome
•GNUMap uses indexing of all 10-mers in
the genome as seed points for read
mapping
Building the Hash Table
y
r
o
t
a
r
o
b
a y
L t Sliding window
i
s s
e r indexes all locations in Hash Table
c e
n v
e i the genome
i n
c U
S
l g
a n
n u
o o
i Y
t
a
t m
u a
p h
m g
o i
r
C B
AACCA
AACCAT AACCA
AACCAT T
T
ACTGAACCATACGGGTACTGAACCATGAATGGCACCTATACGAGATACGC
ACTGAACCATACGGGTACTGAACCATGAATGGCACCTATACGAGATACGC
CATAC
CATAC
no reviews yet
Please Login to review.