@WikiNewPageEditViewToolsHelp
Create New Page Create New Page from Copy
Create your new wiki
Edit this page Copy from this page Rename
Attach (Upload) File
Edit Menu
Newest Change History Referer Trackback
Page List Tag Cloud RSS1.0 RSS2.0
Search
@Wiki Guide
FAQ/about @wiki FAQ/about Editting FAQ/about Register
Update Infomation Release Plan

Discussion Summary

Table of Content

 

Summary080122

Xide Lin, UIUC

Presentation File 

1. Abstract & Introduction

2. Data Description & Preprocessing
Q: It seems that the reduced one-dimensional sequences lost the information of time steps;

3. Outline of Approach

4. The Normalized LCS Measure

5. Clustering & Outlier Detection
Q: How to decide the value of K in clustering?
Q: How to decide the percentage of the flights in one cluster which are detected as outlier?

6. Detection Anomalous Events in Atypical Flight
Q: Besides the three types mentioned in this paper, are there any other anomalous event types?
Q: It seems that the criteria for insertion and deletion are local greedy.

7. Experiment & Conclusion
Q: Can we get the same detail data?
Q: There is no feedback from airplane safety experts who can judge the effectiveness of the experiment result. Are there any ways for us to evaluate the usefulness and accuracy?

 

Summary080129

Bolin Ding, UIUC

Presentation File 1 --- Generating Kernel Function from Ensemble of Mixture Models for Clustering

Presentation File 2 --- Introduction to Dataset and High-Dimensional Clustering

Clustering Problem in Event Cube:

1. Clustering Reports

2. Finding Recurring Anomalies - "anomalies that may be described in different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system.“

Q1: How to define Recurring Anomalies more formally?

Q2: What is the motivation?

Q3: How to interpret the results?

Understand the Dataset more Deeply: 

ASRS_ER

Figure 1.ER Diagram of ASRS (fromAviation Safety Information Analysis and Sharing (ASIAS))

ER Diagram of Aviation Safety Reporting System consists of six tables:

From the attributes of these tables, we draw the structure of an "event" in Figure 2.

Figure 2. Structure of an event

An event is a basic unit of information.Eacheventis a tuple in tableASRS2_EVENT_DATA, and its attributes describe the time/place, the weather condition, a narrative description, and a synopsis of this event. Note the narrative description and the synopsis are the main text information of this event. Each event is associate with aACCESSION NUMBER, which is a unique ASRS incident tracking number and the primary reference link for the NASDAC database.

Each event tuple is associated with a tuple in tableASRS2_ANOMALY_DATA according to attribute ACCESSION NUMBER, associated with one or more tuples in tableASRS2_BRIEF_REPORT_TBL andASRS2_AIRCRAFT_DATA according to attributes ACCESSION NUMBER andAIRCRAFT SEQUENCE, associated with one or more tuples in tableASRS2_REPORTER_DATA according to attributes ACCESSION NUMBER andPERSON SEQUENCE.

  • Each tuple in table ASRS2_ANOMALY_DATA describe the anomalies of this event, including observations, consequences, resolution actions ... It also identifies who detect the anomalies (using PERSON SEQUENCEs).
  • Each pair of tuples in table ASRS2_BRIEF_REPORT_TBL and ASRS2_AIRCRAFT_DATA (with the same AIRCRAFT SEQUENCE) includes the information about an aircraft of this event, including its model, flight purpose, crew quantity... Note more than one aircrafts may be associated with an event.
  • Each tuple in table ASRS2_REPORTER_DATA describes the position, duty, and some other information of a reporter. More than one reporter may be associated with an event.

An example of an "event" unit can be foundhere.

Q1: What is the function of table ASRS2_REPORT_LIST? What is the function of attribute ASSOCIATED ACCESSION NUMBER in this table?

Q2: Can there be more than one tuples in table ASRS2_ANOMALY_DATA associated with an event?

Q3: It seems there are a lot of missing values in the results retrived fromASRS Database Query Tool. Is it true?

 

Summary080205

Bolin Ding, UIUC

Presentation File --- Experimental mthods employed in the following two papers:

Ashok N. Srivastava and Brett Zane-Ulman: Discovering Recurring Anomalies in Text Reports Regarding Complex Space Systems, in the Proceedings of the 2005 IEEE Aerospace Conference, 2005.

Ashok N. Srivastava, et al.: Enabling the Discovery of Recurring Anomalies in Aerospace Problem Reports using High-Dimensional Clustering Techniques, in the Proceedings of the 2006 IEEE Aerospace Conference, 2006.

Performance Study Methods used for ASRA Dataset

1. Clustering Recurring Anomalies

How to represents recurring anomalies?

How to measure the effectiveness of algorithms for finding recurring anomalies?

2. Classification - Inferring Anomalies in a Report

Selecting the optimal number of terms

Comparing different classification methods and different kernels

 

Summary080229

Bolin Ding, UIUC

Presentation File--- Mining (Closed) Gapped Subsequences in Text Data and Applications

Key Points:

  • Text can be considered as a sequence of words --- preserving the information about ordering
  • A sequence of words may:
    1. appears in different reports
    2. appears in the same report for multiple times

These word-sequences, like like "HOT ENGINE" and "GEAR HANDLE RETRACT", decribe different phenomena or causes (recurring anomalies) hidded in the reports

  • What we have: efficient algorithms for finding the sequences of words appearing frequently in the report set
  • Application:
    1. Finding recurring anomalies and helping interpret a set of reports
    2. Finding frequent and discriminative word-sequences as features for clustering / classification

 

Summary080311

Xide Lin, UIUC

Presentation File--- Some cube design issues

1. How to use existing Information Retrieval methodologies in data cube

2. How to achieve good scalability (scalability problem for ASRA dataset's high dimensionality)

 

Summary080312

Feida Zhu, UIUC

Presentation File--- Signature Anomaly Keywords for Accident Prediction

Signature Motif:

Extract some short sequence of keywords from each class such that these keywords occur with high probability in the corresponding class but with low probability in all other classes

Statistics-Enhanced Suffix-Tree:

Enhance each suffix tree node with statistics info

 

Summary080318

Duo Zhang, UIUC

Presentation File--- Topic Modeling Methods for ASRS Report Analysis: Topic Cube

 

Figure 1. Topic Cube

Objectives of Topic Cube:

  • Finding missing anomaly labels
  • Finding phrases to describe each anomaly event
  • Constructing a cube which combines both context (time, location) and anomaly events (topics) for OLAP analysis

Tool:

Semi-supervised PLSA

 

Summary0805-1

Bolin Ding, UIUC

Presentation File --- Text (Data) Cube, Multi-Document Summarization in Data Cube, and Stable Cluster Extraction

Text (Data) Cube:

  • High-dimensional text OLAP

Example of OLAP queries in Text Cube

(Time.year = 2007, Place.state = IL, Environment.light = daytime) Summarization of Report

(Time.year = 2007, Place.state = IL) Correlation between Environment.light, Event.anomaly, and terms in Report

(Time.year = 2007, Place.state = IL) Classifying a new Report’s Event.anomaly

  • Frequent-pattern-based data cube for text

Represent a set of reports with the set of frequent itemsets (of words)

  • Facilitating text OLAP tasks

Summarization/Correlation/Clustering/Classification tasks can be carried out based on the frequent itemsets

Multi-Document Summarization in Data Cube:

Summarize a set of documents into a set of words. Similar to the probabilistic model, but we use TF-IDF weighting. TF-IDF weights can be easily aggregated in data cube

Stable Cluster Extraction:

  • A stable term-cluster <=> A common cause of some anomalies
  • Design idea overview
  1. For each anomaly, construct a term-graph
  2. Do clustering in the graphs to find term-clusters
  3. A term-cluster is stable if it appears with minimal changes across d different anomalies
  4. Find stable term-cluster by computing minimal-weight tree / subgraph of size d

 

Summary0805-2

Duo Zhang, UIUC

Presentation File--- Efficient Aggregation of Topic Cube

Topic Cube:

Refer to Summary080318

Efficient Computation of Topic Cube:

One main problem in topic cube construction is how to aggregate topic models from cells. In this report, we propose one heuristic method to solve this problem. The efficiency of this method will be tested in our next step