Pavel's Software Engineering Log: September 2008

Sunday, September 28, 2008

Example: how DTW works 3 peaks and 2 peaks.

A simple example of DTW algorithm for two trajectories, one has three peaks, while the second one two peaks. In my opinion two trajectories are distinct enough to consider them non-similar: clearly, trajectory #1 has two minor activity peaks by sides of major activity peak while trajectory #2 has two similar peaks of activity during the time trajectory #1 is idle. The task of this exercise is to explore the DTW toolkit which prevents this trajectories from aligning. In particular, I'd like to try the Sakoe-Chuba Band and Itakura Parallelogram as shown at Figure 4 of Salvador, Stan and Philip Chan. FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space, Intelligent Data Analysis, 2007

The original data plotted

Time series

Naive DTW DP in action

Before and After DTW

My Java DTW implementation finally works within the ProjectBrowser

During last week I've fixed my DTW Algorithm implementation in Java. Next two screenshots are showing the reference R implementation at the left side and my Java implementation at the right side. They both happened to be identical before DTW and after DTW.

And the immediate effect on the Trajectory tool visualization:

Monday, September 22, 2008

Placeholder for the Monday, September, 22

Basically it's about 10:32PM (MST) and I am late with posting my entry. Just found why my DTW implementation doesn't follows the reference one: I was not averaging points which were merged during the DTW along with NOT interpolating ones which happen at gaps. The bug is understood, but I am not feeling like patching it right now: it requires writing tests and altering some code. Will update this blog entry tomorrow. Meanwhile I went through the "spaghetti code" sections and some telemetry artifacts remaining in my code and cleaned it up, however I think that the sequence of events to get three plots as in the previous entry is still far from the perfect. Also I am working on the DTW panels UI and cleaned up the streams statistics from some junk along with introducing a little DTW statistics panel. See you a little later :).

Monday, September 15, 2008

Naive DTW is up and running.

So I've made it through the Wicket data models and ended up with a lot of a spaghetti code for this version while refactoring telemetry/trajectory code and my previous DTW code out of trajectory. Look at the amount of debug information I am printing, (haha!):

Voila:

The trajectory convenience in action, "the base page":

If you can find a little "Do DTW analysis" button right to the "Get Chart" button and click it this is what you'll see:

Basically along with the streams statistics at the left you can see the sequence of three plots:
1) the original trajectory/telemetry chart,
2) the normalized timeseries which have Zeroes instead of NA... (i know it'll work for DevTime chart, but I need to figure out what to do with other streams.)
3) the DTW-treated normalized time series.

Nice... However I found that it is somehow different from the R implementation I am using as the reference:

Right now I am debugging what is going wrong with my Dynamic Programming implementation and meanwhile trying to clean-up my code from repeating myself. While debugging, I found an awesome new feature in the Excel 2007: Conditional formatting, just look at this beautiful heat maps:

Bottom line for this week progress is that Naive DTW is implemented and embedded in the ProjectBrowser. You would ask, "So what about "similarity"?", well, I have a code to calculate the Euclidean distances between time series too, and have numbers calculated, just need to find a place where to render those at UI panels.

Next week plan is: code cleanup and optimization, implementation of the R Charting instead of GoogleCharts.

Monday, September 8, 2008

Implementing DTW in trajectory

In the my previous post about hackystat-trajectory progress I've said that I'm almost there with implementing the indent for the trajectory charts as at the following screenshot:

Yeah, in fact it took me about 3 hours of typing (I type using two fingers :P) since I've got to alter the code for the whole bunch of TelemetryStream dealing pieces... So, basically my initial estimation of the effort needed turned out to be wrong :(. Anyway, the indent is in the place and now I have the ability to choose particular intervals from the telemetry streams for two projects and align them the way I like.

The next research question I'm working on right now is how to quantify the difference (similarity) level of two trajectories? My idea of UI and workflow for this task is that once an user is satisfied with time series (telemetry interval) selection and alignment using the Trajectory page, he (or she) can proceed further to the next level of analyzes using the very new UI page which I call Trajectory DTW analysis page, and which allows to visualize the DTW algorithm implementation and quantify trajectories similarities using Euclidean distance:

Tuesday, September 2, 2008

Some R color palettes

Once working on the trajectory plotting procedure I got curious about colors to use in my plots. I did a little research about what are common palettes used in R and there they are:

R color palettes

The R code snippet:


YlOrBr <- c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404")
YlOrBr.Lab <- colorRampPalette(YlOrBr, space = "Lab")
YlOrBr.Lab.bias <- colorRampPalette(YlOrBr, space = "Lab", bias=0.5)

jet.colors <-
  colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan",
                     "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))

rgb.palette <- colorRampPalette(c("blue", "orange", "red"),
                                     space = "rgb")
Lab.palette <- colorRampPalette(c("blue", "orange", "red"),
                                     space = "Lab")

demo.pal <-
   function(n, border = if (n<32) "light gray" else NA,
             main = paste("color palettes;  n=",n),
             ch.col = c("YlOrBr.Lab.bias(n)", "YlOrBr.Lab(n)",
                        "heat.colors(n)", "terrain.colors(n)",
                        "cm.colors(n)", "topo.colors(n)",
                        "rainbow(n, start=.7, end=.1)", "jet.colors(n)",
                        "rgb.palette(n)", "Lab.palette(n)"))
     {
       nt <- length(ch.col)
       i <- 1:n; j <- n / nt; d <- j/6; dy <- 2.2*d
       plot(i,i+d, type="n", yaxt="n", ylab="", main=main)
       for (k in 1:nt) {
         rect(i-.5, (k-1)*j+ dy, i+.4, k*j,
         col = eval(parse(text=ch.col[k])), border = border)
         text(2*j,  k * j +dy/4, ch.col[k])
       }
     }
n <- if(.Device == "postscript") 64 else 18
# Since for screen, larger n may give color allocation problem
demo.pal(n)

Monday, September 1, 2008

Starting the Fall'08.

Officially starting the Fall'08 semester.

This Fall I'm taking three classes: Compilers, Algorithms, and the independent study. As the part of my independent study, I'll be blogging the progress weekly here.

So, last week I spent coding the Trajectory page for the ProjectBrowser component of the Hackystat. This was my very first experience using Apache Wicket framework as well as Jetty Web Server, since previously in my web-development I was using Google Web Toolkit and Apache Tomcat. It was a fun experience to learn a new framework which is so different from GWT. Can’t say if it’s better or worse, but for sure it’s different. For now it looks a way easier to start coding using Wicket than GWT. When started I didn't follow the Hints for development guide but cloned the telemetry package and started by altering the existing code.

I've had the custom telemetry simulation package I've coded earlier for stand-alone trajectory so data for two trajectories were populated right away into my local sensorbase:

trajectory1 devtime plot

Trajectory1 has the lifecycle from the January, 1, 2008 till February, 1, 2008

trajectory2 devtime plot

while Trajectory2 has the lifecycle from the March, 1, 2008 till April, 1, 2008.

The lack of an overlap between lifecycles for the trajectory1 and trajectory 2 makes it impossible to plot both telemetry curves simultaneously using the Telemetry component of ProjectBrowser.

The Trajectory component "mission" is to overcome this limitation and on the following screenshot you can see the third week of devtime for the trajectory1 and second and third weeks of trajectory2:

the third week of devtime for the trajectory1 and second and third weeks of trajectory2

And the Trajectory UI screenshot:
The Trajectory UI screenshot.

The non-cropped version:

There are some changes I've made in the original Telemetry UI and plotting protocol I'd like to get feedback for:

1) The color schema for the plots: I've coded the new "color-picker" method which picks colors from the JetColors palette instead of random color selection. I personally found that some of the random colors which are selected by the original Telemetry color-picker are very hard to read.

2) As you can see, there is no "date ticks" at X axis of the plot... and I don't like it, but didn't explore yet the possibility to print two time-lines along the X axis.

3) Right now I can extend date interval for the one of the two projects to the right, I don't know if it's good or not, but it's rather feature than bug (IMO). I've got an idea that it would be nice to have the ability to introduce the indent before the one of projects interval beginning too... I'm almost there.

4) Should I merge my code with the hackystat-ui-wicket trunk?

So, this is my current progress and right now I'm implementing the Dynamic Time Warping algorithm for trajectories comparison. The current idea is to display the warped curves and warping procedure parameters just below the original trajectory plot.

.

Pavel's Software Engineering Log