Tuesday, January 27, 2009

User-account environment variables under Vista

I had some problems running the TexLive binaries under Vista caused by the conflict between the Cygwin TeX installation and TexLive. TeXLive installer adds the path to binaries as the user-specific environment variable which than added to the very end of the PATH variable and consequently TeXLive binaries are never reached. The way to fix this issue is to reshuffle the variables to make TexLive binaries to be loaded by default. This should help to find your user environment settings

Monday, January 26, 2009

Literature review plans

Finally I was able (almost) to install the TexLive on my laptop. The installer never finished so I don't really know if everything works (the 00-00 example compiles though). I'm running Windows Vista and it looks like the TexLive installer has permission issues with non-Administrative account + there seems to be a bug with installer when it tries to use system Perl instead of one shipped with installer (which causes some libraries and runtime issues). It took me some time to figure out both issues and I don't really see advantages of TexLive versus MikTEX at this point. The one more disadvantage I see is the lack of the DVI viewer in the TexLive distro, looks like I need to install the viewer, but I'm not sure if I will need it, so we'll see further if I will need one.

From other things I've set up all the Java stuff, updated libraries and Hackystat, checked out all the latest sources etc. and backed up the whole thing, just in case.
So, system is ready to go.

Most of the time I spent on putting together the outline for the literature review. I am seeing the purpose of this writing to be a comprehensive walkthrough through the field of the time-series analysis outlining the milestones and major discoveries and connecting them with my research. I found that I've totally missed some major things in the time-series analysis (funny huh?) and filling these gaps with reading and collecting the literature.

Following is the draft plan, I'm working on the third part and since it is based on the material from the part 2, I am changing its flow too.

Literature review plan


  1. Introduction. (definitions, research field boundaries and common applications)



    1. Introduction to time series.

      1. Data sources, time-series representation and common applications

        (the time series “origin”, common representation and mainstream applications)

      2. Streaming time-series.

        Time series as streams.

      3. Time-series databases and indexing

        (examples of existing time-series collections (+ the Hackystat sensorbase) and common time-series databases toolkit for time series data storage, search and retrieval)



    2. Classical time series analyses.

      1. General exploration & description

        (time series descriptive exploration and common tools used: spectral analysis, autocorrelation, trends, periodicity (+ Hackystat Telemetry, + Hackystat Zorro?, + Hackystat Trajectory))

      2. Prediction and forecasting

        (stochastic modeling: AR, MA, ARMA, ARIMA and uses (+ Hackystat Trajectory))


    3. Time series similarity (homogeneity) based analyses.

      1. The speech and handwriting recognition.

        (pioneering the area of DTW, LCS and HMM)

      2. Sign language, motion and gesture recognition.

        (ongoing research)

      3. Trajectory patterns recognition, surveillance applications, shape recognition.

        (modern applications)






  2. Time series similarity-based analyses and algorithms

    (known research tools, implemented applications and up-to date research directions)



    1. Similarity metrics

      1. Euclidean distance.

        (application and problem of normalization)

      2. Hamming and Edit distances.

        (the formal introduction of edit distance, time-series transformations)



    2. Similarity-finding algorithms

      1. DTW

      2. LCS



    3. Methods (whole and sub sequence applications)

      1. Clustering

      2. Indexing

      3. Classification

      4. Anomaly detection



    4. Known state of the art applications.




  3. Possible application of the algorithms and methods to the Hacvkystat Telemetry Streams



    1. Similarity search in the Sensorbase

      (the search for similarity using the raw telemetry data stored within the sensorbase)

    2. Telemetry Streams data Indexing

      (defining the Telemetry patterns, indexing raw telemetry data using definitions and conducting search be means of indices and Edit distance)

    3. Live Telemetry Stream analysis and features

      (patterns, anomaly detection)



Tuesday, January 20, 2009

First post in the 2009

The TechReport for 699 course. The importance of. :)
I've finished the Fall 2008 semester writing a technical report concerning the DTW algorithm, it's existing implementations, uses and extensions and outlined possible application to the software metrics based on my own implementation. While being a little worrying about this before starting I found the report writing as an extremely helpful and interesting activity for the number of reasons:

  • first of all it forced me to summarize and distill all of the essence from the work done so far and get it graded. By doing so I've not only reassessed my current position in the research but what I found extremely useful - I was able to identify gaps (weak points) in the research I am doing. The goal now is to make my research complete through evenly covering all of the areas of the interest and connecting it with adjacent fields.

  • secondly I found that in my case writing the "tech report for the 699" is actually more likely writing a draft (or an outline) of the Literature Review and the Thesis Proposal: two pieces of writing which are required for the PhD degree. How cool is that?

  • in addition to these two items, overviewing the research done so far, seeing things in the ToDo list and having outlined LitReview and Proposal makes ones (my) minds clear and brings confidence of the right track chosen.



Current progress.
This week I am working on the setting up the "working environment" for this semester. Taking in account the amount of the software development and writing ahead, spending some time on selecting technologies, cleaning and organizing the system environment and hardrive, updating tools etc seems to be a reasonable activity.

As in all previous work I am going to use Java, Eclipse, Hackystat infrastructure and the standard set of Hackystat libraries for the core programming. More likely I'll be using MIG Layout for the UI development of the stand-alone Trajectory tool. The R will be used for making figures and fast-scripting when I need to test something before actually implementing.

For my latest report and all other previous LaTeX-based documents I was successfully using a combination of the MikTex and TeXnicCeneter, but the recent changes in the CSDL requirements moving me towards the use of the TexLive and currently I'm setting up tools and environment testing this new for me approach.