Keywords
ASRS Data Set
-
After each commercial flight in the United States, a report
is written for that flight describing how the flight went and whether any
anomalous events occurred. This data set is a collection of 20,696 of those
reports categorized into 62 different anomalies. Between zero and twelve
different anomalies were assigned to each report. [1]
-
Accessing ASRS data:
-
Tasks
Automatic Categorization of ASRS Reports
Problem Description
Exsiting Solution A
Proposed in [8]
(NLP is expensive - large hand-crafted rule bases)
Experimental Result A
Searching for Recurring Anomalies
Problem Description
-
Clustering problem
-
Recurring Anomalies - "anomalies that may be described in
different ways by different authors, at varying times and under varying
conditions, but that are truly about the same part of the system." [1]
-
Given a set of N documents, where each document is a free
text English document that describes a problem, an observation, a treatment, a
study, or some other aspect of the vehicle, automatically identify a set of
potential recurring anomalies in the reports. [3]
-
Related Problems: Topic Detection and Tracking
Exsiting Solution A
Proposed in [3]
-
Representing Text Documents in a Vector Space: Term-document
matrix (or “Bag of Words” matrix)
-
Preprocessing: Reduce the number of dimensions (e.g.,
PCA)
-
Using exsiting methods - Several clustering methods for
high-dimensional data are exmined
Exsiting Solution B
Proposed in [1]
-
Each recurring anomaly document is generated by: a general
English language model (the choice of words), a topic model (type of the
problem), and a document-specific information model (problem details)
-
Solve multinomial distribution parameter estimation
problem
Text OLAP
High-dimensional OLAP analysis of text data
Reference
Refer to ourreference
page