Xide Lin, UIUC
1. Abstract & Introduction
2. Data Description & Preprocessing
Q: It seems that the reduced one-dimensional sequences lost the information of
time steps;
3. Outline of Approach
4. The Normalized LCS Measure
5. Clustering & Outlier Detection
Q: How to decide the value of K in clustering?
Q: How to decide the percentage of the flights in one cluster which are
detected as outlier?
6. Detection Anomalous Events in Atypical Flight
Q: Besides the three types mentioned in this paper, are there any other
anomalous event types?
Q: It seems that the criteria for insertion and deletion are local greedy.
7. Experiment & Conclusion
Q: Can we get the same detail data?
Q: There is no feedback from airplane safety experts who can judge the
effectiveness of the experiment result. Are there any ways for us to evaluate
the usefulness and accuracy?
Bolin Ding, UIUC
Presentation File 1 --- Generating Kernel Function from Ensemble of Mixture Models for Clustering
Presentation File 2 --- Introduction to Dataset and High-Dimensional Clustering
1. Clustering Reports
2. Finding Recurring Anomalies - "anomalies that may be described in different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system.“
Q1: How to define Recurring Anomalies more formally?
Q2: What is the motivation?
Q3: How to interpret the results?

ER Diagram of Aviation Safety Reporting System consists of six tables:
From the attributes of these tables, we draw the structure of an "event" in Figure 2.
An event is a basic unit of information.Eacheventis a tuple in tableASRS2_EVENT_DATA, and its attributes describe the time/place, the weather condition, a narrative description, and a synopsis of this event. Note the narrative description and the synopsis are the main text information of this event. Each event is associate with aACCESSION NUMBER, which is a unique ASRS incident tracking number and the primary reference link for the NASDAC database.
Each event tuple is associated with a tuple in tableASRS2_ANOMALY_DATA according to attribute ACCESSION NUMBER, associated with one or more tuples in tableASRS2_BRIEF_REPORT_TBL andASRS2_AIRCRAFT_DATA according to attributes ACCESSION NUMBER andAIRCRAFT SEQUENCE, associated with one or more tuples in tableASRS2_REPORTER_DATA according to attributes ACCESSION NUMBER andPERSON SEQUENCE.
An example of an "event" unit can be foundhere.
Q1: What is the function of table ASRS2_REPORT_LIST? What is the function of attribute ASSOCIATED ACCESSION NUMBER in this table?
Q2: Can there be more than one tuples in table ASRS2_ANOMALY_DATA associated with an event?
Q3: It seems there are a lot of missing values in the results retrived fromASRS Database Query Tool. Is it true?
Bolin Ding, UIUC
Presentation File --- Experimental mthods employed in the following two papers:
Ashok N. Srivastava and Brett Zane-Ulman: Discovering Recurring Anomalies in Text Reports Regarding Complex Space Systems, in the Proceedings of the 2005 IEEE Aerospace Conference, 2005.
Ashok N. Srivastava, et al.: Enabling the Discovery of Recurring Anomalies in Aerospace Problem Reports using High-Dimensional Clustering Techniques, in the Proceedings of the 2006 IEEE Aerospace Conference, 2006.
1. Clustering Recurring Anomalies
How to represents recurring anomalies?
How to measure the effectiveness of algorithms for finding recurring anomalies?
2. Classification - Inferring Anomalies in a Report
Selecting the optimal number of terms
Comparing different classification methods and different kernels
Bolin Ding, UIUC
Presentation File--- Mining (Closed) Gapped Subsequences in Text Data and Applications
These word-sequences, like like "HOT ENGINE" and "GEAR HANDLE RETRACT", decribe different phenomena or causes (recurring anomalies) hidded in the reports
Xide Lin, UIUC
Presentation File--- Some cube design issues
1. How to use existing Information Retrieval methodologies in data cube
2. How to achieve good scalability (scalability problem for ASRA dataset's high dimensionality)
Feida Zhu, UIUC
Presentation File--- Signature Anomaly Keywords for Accident Prediction
Extract some short sequence of keywords from each class such that these keywords occur with high probability in the corresponding class but with low probability in all other classes
Enhance each suffix tree node with statistics info
Duo Zhang, UIUC
Presentation File--- Topic Modeling Methods for ASRS Report Analysis: Topic Cube

Semi-supervised PLSA
Bolin Ding, UIUC
Presentation File --- Text (Data) Cube, Multi-Document Summarization in Data Cube, and Stable Cluster Extraction
Example of OLAP queries in Text Cube
(Time.year = 2007, Place.state = IL, Environment.light = daytime) Summarization of Report
(Time.year = 2007, Place.state = IL) Correlation between Environment.light, Event.anomaly, and terms in Report
(Time.year = 2007, Place.state = IL) Classifying a new Report’s Event.anomaly
Represent a set of reports with the set of frequent itemsets (of words)
Summarization/Correlation/Clustering/Classification tasks can be carried out based on the frequent itemsets
Summarize a set of documents into a set of words. Similar to the probabilistic model, but we use TF-IDF weighting. TF-IDF weights can be easily aggregated in data cube
Duo Zhang, UIUC
Presentation File--- Efficient Aggregation of Topic Cube
Refer to Summary080318
One main problem in topic cube construction is how to aggregate topic models from cells. In this report, we propose one heuristic method to solve this problem. The efficiency of this method will be tested in our next step