<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
<rss version="2.0">
<channel>
	<title>Event Cube Research Wiki</title> 
	<description></description> 
	<link>http://eventcube.atwiki.com/</link> 



		<item>
		<title>
			 <![CDATA[ answers_plugin ]]> 
		</title>
		<description>
			<![CDATA[ ==answers plugin test with wiki-mode==

**answers plugin** is __listing your interesting yahoo-answers.__
Example) **@@answer(love)@@** is

---------

@@answers(love)@@

---------

 ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/answers_plugin
		</link>
		<pubDate>Tue, 02 Dec 2008 22:47:02 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ test1 ]]> 
		</title>
		<description>
			<![CDATA[ :k \by \log{N/n}  ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/test1
		</link>
		<pubDate>Fri, 12 Sep 2008 21:02:01 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Reference ]]> 
		</title>
		<description>
			<![CDATA[ 
[1] Ashok N. Srivastava, et al.:Enabling
the Discovery of Recurring Anomalies in Aerospace Problem Reports using
High-Dimensional Clustering Techniques, in the Proceedings of the 2006 IEEE
Aerospace Conference, 2006.
[2] Suratna Budalakoti, et al.:Anomaly
Detection in Large Sets of High-Dimensional Symbol, XXXX Conference,
200X.
[3] Ashok N. Srivastava, et al.:Discovering
Recurring Anomalies in Text Reports Regarding Complex Space Systems, in the
Proceedings of the 2005 IEEE Aerospace Conference, 2005.
[4] Suratna Budalakoti, et al.:Discovering
Atypical Flights in Sequences of Discrete Flight Parameters, in the
Proceedings of the 2006 IEEE Aerospace Conference, 2006.
[5] Ashok N. Srivastava:Mixture
Density Mercer Kernels: A Method to Learn Kernels Directly from Data, in
the Proceedings of the 2004 SIAM Data Mining Conference, 2004.
[6] Ashok N. Srivastava:Onboard
Detection of Snow, Ice, Clouds, and Other Geophysical Processes Using Kernel
Methods, in the Proceedings of the 2003 ICML Workshop on Machine Learning
Technologies for Autonomous Space Sciences, 2003.
[7] SIAM 2003 Authors:2004
SIAM Data Mining Conference Workshop on Data Mining for Counter Terrorism and
Security, 2003.
[8] Ashok N. Srivastava:An
Overview of Data Mining at NASA, presentation, 200X.
 
 ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Reference
		</link>
		<pubDate>Mon, 08 Sep 2008 07:40:27 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Datasets and Lexicons ]]> 
		</title>
		<description>
			<![CDATA[ ==Title== ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Datasets%20and%20Lexicons
		</link>
		<pubDate>Fri, 05 Sep 2008 17:13:33 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ CIDU08 ]]> 
		</title>
		<description>
			<![CDATA[ 
Topic Cube: OLAP of Text Data


In this poster, we present a new concept called Topic Cube, which is the
first attempt of extending OLAP technology for exploring text data. The power
of a topic cube is to enable a user to analyze a large set of documents in
different contexts and topics with multiple granularity.

Text Cube: Flight Report Mining by High Dimensional OLAP
Since Jim Gray introduced the concept of “data cube” in 1997, data cube,
associated with online analytical processing (OLAP), has become a driving
engine in data warehouse industry.  Many real-life applications, such as the
airline industry, have generated an ever increasing amount of text data
associated with other multidimensional information.  To endow such data with
matching analytical capacities, this poster proposes a novel cube model, called
“text cube”, that integrates the power of traditional OLAP and IR techniques
for text mining.
Text cube can greatly contribute to the airplane safety study. After each
commercial flight, a record is created which contains pilot reports logging
observations during the flight, together with other multidimensional
information such as weather conditions, anomalies types and engine models.  On
one hand, traditional OLAP cubes are equipped with a wealth of powerful tools
for classification, correlation analysis and pattern mining to perform advanced
analysis of the multidimensional information; on the other hand, advances in
information integration and content summarization offer great opportunities for
insightful discoveries on the pilot reports. Text cube combines the power from
both fields such that structured and unstructured data would mutually enhance
mining and understanding of anomalous aviation events.
Repetitive Sequential Patterns for Finding Anomalies
As a step towards discovering recurring anomalies regarding to atypical
flights, we study the problem of mining repetitive sequential pattern. After
the flight of a commercial airliner, we have two types of informative records
in the form of a sequence: (i) the text report that narrates the observed
anomalies; (ii) the ordered list of discrete parameters that correspond to
binary switches inside the cockpit. Given a collection of sequences either from
(i) or from (ii), an initial step to analyzing these sequences is to find out
the frequent sequential patterns. These frequent patterns themselves either
might indicate the significant factors of anomalies, or can be used as features
for clustering/classifying the sequences to help safety experts discover
recurring anomalies.
In this poster, we introduce our work on finding frequent sequential
patterns for these sequences. The patterns we are interested in might repeat
multiple times in a sequence, and thus they are called repetitive sequential
patterns. Different from the traditional sequential pattern mining algorithms,
our mining algorithms also capture the instances of a pattern repeating within
each sequence; therefore, whether a pattern is selected to be output also
depends on how many times it repeats within each sequence, and this information
is provided to users for the analysis purpose. Experiments are conducted to
show the effectiveness of our methods.
 ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/CIDU08
		</link>
		<pubDate>Thu, 04 Sep 2008 23:27:02 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Menu ]]> 
		</title>
		<description>
			<![CDATA[ ==menu==
* [[FrontPage]]
* [[Menu]]
----
==recent list 20 ==
@@recent(20)@@
----












 ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Menu
		</link>
		<pubDate>Thu, 04 Sep 2008 19:12:16 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Discussion Summary ]]> 
		</title>
		<description>
			<![CDATA[ 
Presentation
File 
1. Abstract &amp;amp; Introduction
2. Data Description &amp;amp; Preprocessing
Q: It seems that the reduced one-dimensional sequences lost the information of
time steps;
3. Outline of Approach
4. The Normalized LCS Measure
5. Clustering &amp;amp; Outlier Detection
Q: How to decide the value of K in clustering?
Q: How to decide the percentage of the flights in one cluster which are
detected as outlier?
6. Detection Anomalous Events in Atypical Flight
Q: Besides the three types mentioned in this paper, are there any other
anomalous event types?
Q: It seems that the criteria for insertion and deletion are local greedy.
7. Experiment &amp;amp; Conclusion
Q: Can we get the same detail data?
Q: There is no feedback from airplane safety experts who can judge the
effectiveness of the experiment result. Are there any ways for us to evaluate
the usefulness and accuracy?
 ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Discussion%20Summary
		</link>
		<pubDate>Mon, 26 May 2008 06:12:37 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Part of Experimental Results ]]> 
		</title>
		<description>
			<![CDATA[ ==Predict Missing Event Anomalies==

Here are some examples of missing label prediction for ASRA data by our algorithm: [http://eventcube.atwiki.com/file/open/21/Anomaly_Prediction_Examples.pdf Example] ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Part%20of%20Experimental%20Results
		</link>
		<pubDate>Mon, 26 May 2008 05:29:35 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Event Cube Project ]]> 
		</title>
		<description>
			<![CDATA[ 
Keywords

Proactive risk management
What Happened, and Why
Categorizing and detecting anomalies
Safety documents

 
 
Thinking

Is there any one-to-many classification algorithm?

 
 
ASRS Data Set

After each commercial flight in the United States, a report is written for
that flight describing how the flight went and whether any anomalous
events occurred. This data set is a collection of 20,696 of those reports
categorized into 62 different anomalies. Between zero and twelve different
anomalies were assigned to each report. [1]

 
 
Problems
 
Automatic Categorization of ASRS Reports
Problem Description

Classification problem
Obtain mapping: ASRS Report Extract =&amp;gt; ASRS Anomaly Categories
(one-to-many mapping? or many one-to-one mappings?)
Automatically categorize new reports

Exsiting Solution A
Proposed in [8]

Convert documents into a vector space representation “Bag of Words”
matrix
SVM with Natural Language (Pre)Processing (NLP)

     (NLP is expensive - large hand-crafted rule bases)

Mariana (an advanced Markov Chain Monte Carlo algorithm to find the best
SVM hyperparameters) without NLP

Experimental Result A

Mariaria without NLP (with raw text only) = SVM with NLP &amp;gt; other methods
with NLP

 
Searching for Recurring Anomalies
Problem Description

Clustering problem
Recurring Anomalies - &quot;anomalies that may be described in different ways by
different authors, at varying times and under varying conditions, but that are
truly about the same part of the system.&quot; [1]
Given a set of N documents, where each document is a free text English
document that describes a problem, an observation, a treatment, a study, or
some other aspect of the vehicle, automatically identify a set of potential
recurring anomalies in the reports. [3]
Related Problems: Topic Detection and Tracking

Exsiting Solution A
Proposed in [3]

Representing Text Documents in a Vector Space: Term-document matrix (or
“Bag of Words” matrix)
Preprocessing: Reduce the number of dimensions (e.g., PCA)
Using exsiting methods - Several clustering methods for high-dimensional
data are exmined

Exsiting Solution B
Proposed in [1]

A new mixture model: assume each document is generated by a distinct
multinomial distribution


Each recurring anomaly document is generated by: a general English language
model (the choice of words), a topic model (type of the problem), and a
document-specific information model (problem details)
Solve multinomial distribution parameter estimation problem


Discover and cluster recurring anomalies based on the distance between
different distributions

 
 
Reference
[1] Ashok N. Srivastava, et al.:Enabling
the Discovery of Recurring Anomalies in Aerospace Problem Reports using
High-Dimensional Clustering Techniques, in the Proceedings of the 2006 IEEE
Aerospace Conference, 2006.
[2] Suratna Budalakoti, et al.:Anomaly
Detection in Large Sets of High-Dimensional Symbol, XXXX Conference,
200X.
[3] Ashok N. Srivastava, et al.:Discovering
Recurring Anomalies in Text Reports Regarding Complex Space Systems, in the
Proceedings of the 2005 IEEE Aerospace Conference, 2005.
[4] Suratna Budalakoti, et al.:Discovering
Atypical Flights in Sequences of Discrete Flight Parameters, in the
Proceedings of the 2006 IEEE Aerospace Conference, 2006.
[5] Ashok N. Srivastava:Mixture
Density Mercer Kernels: A Method to Learn Kernels Directly from Data, in
the Proceedings of the 2004 SIAM Data Mining Conference, 2004.
[6] Ashok N. Srivastava:Onboard
Detection of Snow, Ice, Clouds, and Other Geophysical Processes Using Kernel
Methods, in the Proceedings of the 2003 ICML Workshop on Machine Learning
Technologies for Autonomous Space Sciences, 2003.
[7] SIAM 2003 Authors:2004
SIAM Data Mining Conference Workshop on Data Mining for Counter Terrorism and
Security, 2003.
[8] Ashok N. Srivastava:An
Overview of Data Mining at NASA, presentation, 200X.
 ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Event%20Cube%20Project
		</link>
		<pubDate>Mon, 26 May 2008 05:05:36 +0000</pubDate>
	</item>
		<item>
		<title>
			 <![CDATA[ Question and Answer ]]> 
		</title>
		<description>
			<![CDATA[ ==Questions==
* So far the only data that we down here at UTD have been able to locate is contained in PDF format at the first of the two links provided on the wiki's main page.  Is this the only format that the data will be available in, or will there be a more processing-friendly format available for our usage?

* What platform does the code need to run on, and is there a preferred programming language to use?  Are we free to use any programming language we want, or should we select one for the whole project to use?
---- ]]> 
		</description>
		<link>
			http://eventcube.atwiki.com/page/Question%20and%20Answer
		</link>
		<pubDate>Sun, 17 Feb 2008 01:00:51 +0000</pubDate>
	</item>
	

</channel>
</rss>