Sunday, May 31, 2009

Some normalization issues.

While shaping out code for the pilot data set analysis I've figured out browsing motifs and plots that normalization to the zero mean and unit of energy failed to produce "proper" results in the next two cases:
[1] when only single value in the sub-series
[2] when all values are the same, i.e. deviation = 0
which subsequently lead to some weird patterns.

Fixed!

Also still a bit confused by appearance of some "interesting" motifs like here:


The raw values look a bit unaligned to me, last symbol of motif is "c" for both, but this "c" in the red trajectory is less than "a" on blue. But I guess it's all about scale.

Saturday, May 30, 2009

Pilot data set.

I've made plots of pilot data





raw

and Z-normalized


The PDF versions are here: The raw and normalized data.

Thursday, May 28, 2009

Working on the Pilot data set.

Aiming the dissertation thesis proposal I'm working on the Trajectory code right now. The main change here is that the data comes now from the single project and categorized by users instead of simply navigating "anonymous" streams before. This change in the analysis flow pushed me to change a database schema and it looks now as follows:

Also, changes in schema and analysis forced me to rewrite bunch of the iBATIS queries, the coolest query so far is like that:
SELECT sm.id AS motif_id, sm.substring AS motif, sme.id AS entry_id,
(SELECT COUNT(*) FROM sax_motif_offset
JOIN sax_motif_entry ON sax_motif_offset.sax_motif_entry = sax_motif_entry.id
JOIN sax_motif ON sax_motif.id = sax_motif_entry.sax_motif
WHERE sax_motif.id = motif_id) AS entry_frequency
FROM sax_motif sm
JOIN sax_motif_entry sme ON sme.sax_motif=sm.id
JOIN chart ON chart.id = sme.chart
WHERE sm.sax_index = #value#
GROUP BY motif
HAVING entry_frequency > 1
ORDER BY entry_frequency DESC;
and retrieves all motifs for the specific index sorted by frequency.

Monday, May 11, 2009

litreview: almost there

I've finished incorporating Philip's suggestion and syntax fixes (thank you!) into my litreview and the last thing left is the bibtex formatted bibliography which for some reasons getting misformatted by TEX.
The litreview pre-final draft is here: litreview draft

Monday, May 4, 2009

Literature review draft finished.

I've spent last week working on the literature review. Finished the final draft and submitted it for the review.
While reviewing SAX papers I've got ideas and answers for the algorithmic questions I've hit coding my implementation (concerning the normalization and distribution issues). This week I'm going to embed this special cases into my code and get back on track with the Trajectory software.
Meanwhile I'm transitioning in writing from literature review to thesis proposal. Looks doable (-:.

Monday, April 27, 2009

First draft of litreview.

Last week spent reading and writing literature review. I think that I've put most of the stuff I wanted into the writing and just need to "refactor" the writing unifying all those math terms and making smooth transitions between chapters. Thinking of writing a short conclusion section too.
linky

Monday, April 20, 2009

Working on the literature review and digging some interesting stuff.

I'm keeping up with the last week plan and here is the Sunday evening draft of my literature review, it's not proofreaded or wrapped nicely, but bears all ideas I want to put in so far. I'm a little stuck on the lower bounding of distances, but will get over in couple days hoping to put together this stuff with walk through the time-series decomposition techniques by the beginning of the next week. Aiming the first full draft by the next Monday.

As per findings, checkout following plot and my previous thoughts about the Hackydatat distribution. After all looks pretty similar in some sense, but I've never read this article before.