@WikiNewPageEditViewToolsHelp
Create New Page Create New Page from Copy
Create your new wiki
Edit this page Copy from this page Rename
Attach (Upload) File
Edit Menu
Newest Change History Referer Trackback
Page List Tag Cloud RSS1.0 RSS2.0
Search
@Wiki Guide
FAQ/about @wiki FAQ/about Editting FAQ/about Register
Update Infomation Release Plan

Event Cube Project

Keywords

  • ASRS
  • Proactive risk management
  • What Happened, and Why
  • Categorizing and detecting anomalies
  • Safety documents

 

 

ASRS Data Set

  • After each commercial flight in the United States, a report is written for that flight describing how the flight went and whether any anomalous events occurred. This data set is a collection of 20,696 of those reports categorized into 62 different anomalies. Between zero and twelve different anomalies were assigned to each report. [1]
  • Accessing ASRS data:
  • Analysis of ASRA dataset (refer to Discussion Summary - Summary080129)

 

 

Tasks

 

Automatic Categorization of ASRS Reports

Problem Description

  • Classification problem
  • Obtain mapping: ASRS Report Extract => ASRS Anomaly Categories (one-to-many mapping? or many one-to-one mappings?)

 

  • Automatically categorize new reports
  • Predict missing (event anomaly) labels

 

Exsiting Solution A

Proposed in [8]

  • Convert documents into a vector space representation “Bag of Words” matrix
  • SVM with Natural Language (Pre)Processing (NLP)

     (NLP is expensive - large hand-crafted rule bases)

  • Mariana (an advanced Markov Chain Monte Carlo algorithm to find the best SVM hyperparameters) without NLP

Experimental Result A

  • Mariaria without NLP (with raw text only) = SVM with NLP > other methods with NLP

 

Searching for Recurring Anomalies

Problem Description

  • Clustering problem
  • Recurring Anomalies - "anomalies that may be described in different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system." [1]
  • Given a set of N documents, where each document is a free text English document that describes a problem, an observation, a treatment, a study, or some other aspect of the vehicle, automatically identify a set of potential recurring anomalies in the reports. [3]
  • Related Problems: Topic Detection and Tracking

Exsiting Solution A

Proposed in [3]

  • Representing Text Documents in a Vector Space: Term-document matrix (or “Bag of Words” matrix)
  • Preprocessing: Reduce the number of dimensions (e.g., PCA)
  • Using exsiting methods - Several clustering methods for high-dimensional data are exmined

Exsiting Solution B

Proposed in [1]

  • A new mixture model: assume each document is generated by a distinct multinomial distribution
  1. Each recurring anomaly document is generated by: a general English language model (the choice of words), a topic model (type of the problem), and a document-specific information model (problem details)
  2. Solve multinomial distribution parameter estimation problem
  • Discover and cluster recurring anomalies based on the distance between different distributions

 

Text OLAP 

High-dimensional OLAP analysis of text data

 

 

Reference

 

Refer to ourreference page