Keywords
ASRS Data Set
-
"After each commercial flight in the United States, a report is written for
that flight describing how the flight went and whether any anomalous events
occurred. This data set is a collection of 20,696 of those reports categorized
into 62 different anomalies. Between zero and twelve different anomalies were
assigned to each report" (described
in Ashok's paper)
-
Accessing ASRS data:
-
Example Tasks
Please refer to our Publication and Demos
Text OLAP
High-dimensional OLAP analysis of text data
Predict Missing Event
Anomalies
Here are some examples of missing-label prediction for ASRA data by our
algorithm: (pdf)
Related Work
Automatic Categorization of ASRS Reports
Problem Description
Exsiting Solution A
Proposed
in this paper
(NLP is expensive - large hand-crafted rule bases)
Experimental Result A
Searching for Recurring Anomalies
Problem Description
Exsiting Solution A
Proposed in
this paper
-
Representing Text Documents in a Vector Space: Term-document
matrix (or “Bag of Words” matrix)
-
Preprocessing: Reduce the number of dimensions (e.g.,
PCA)
-
Using exsiting methods - Several clustering methods for
high-dimensional data are exmined
Exsiting Solution B
Proposed
in this paper
-
Each recurring anomaly document is generated by: a general
English language model (the choice of words), a topic model (type of the
problem), and a document-specific information model (problem details)
-
Solve multinomial distribution parameter estimation
problem