Wednesday, March 11, 2009

SAX based time series motif search primer

This post is an illustration of the SAX-based motif search in timeseries.
Let's assume that we are given the next timeseries:(1,1,3,5,8,7,6,2,3,4,3,1,1,5,4,5,2,3,4,6,9,8,5,2,3,4,2,5,3,2,5,6,8,9,0,3,3), the length is 37 points:

Now if we run the SAX algorithm using sliding window of size 7, no PAA and alphabet size 6, this is what the resulting substrings matrix looks like:

[aabdfee, abdfeea, bdfeeab, dfeeabb, ffeabcb, ffbcdca, fbdedaa, bdedaaf, dedaafe, dcaafdf, daafefb, aafefbd, afdfbcd, eceabcf, cdabcef, cabbdff, abbdffc, bbdffca, bdffcab, dffcabb, ffdbbcb, fdabcad, faceafc, bdebfdb, ceafcaf, daebaef, adbadef, cbacdff, bbddffa, bddffab, ddffabb]

if we compute all pairwise distances over this strings we will find next matches (distance between strings is zero, and the second column indicates substings indexes):

ffeabcb - ffdbbcb, [4] - [20]
cabbdff - cbacdff, [15] - [27]

which corresponds to next two motifs:

No comments: