@WikiNewPageEditViewToolsHelp
Create New Page Create New Page from Copy
Create your new wiki
Edit this page Copy from this page Rename
Attach (Upload) File
Edit Menu
Newest Change History Referer Trackback
Page List Tag Cloud RSS1.0 RSS2.0
Search
@Wiki Guide
FAQ/about @wiki FAQ/about Editting FAQ/about Register
Update Infomation Release Plan

About Event Cube

Keywords

  • ASRS
  • Proactive risk management
  • What Happened, and Why
  • Categorizing and detecting anomalies
  • Safety documents

 

 

ASRS Data Set

 

 

Example Tasks

Please refer to our Publication and Demos

 

Text OLAP

High-dimensional OLAP analysis of text data

 

Predict Missing Event Anomalies

Here are some examples of missing-label prediction for ASRA data by our algorithm: (pdf)

 

 

Related Work

Automatic Categorization of ASRS Reports

Problem Description

  • Classification problem
  • Obtain mapping: ASRS Report Extract => ASRS Anomaly Categories (one-to-many mapping? or many one-to-one mappings?)
  • Automatically categorize new reports
  • Predict missing (event anomaly) labels

Exsiting Solution A

Proposed in this paper

  • Convert documents into a vector space representation “Bag of Words” matrix
  • SVM with Natural Language (Pre)Processing (NLP)

(NLP is expensive - large hand-crafted rule bases)

  • Mariana (an advanced Markov Chain Monte Carlo algorithm to find the best SVM hyperparameters) without NLP

Experimental Result A

  • Mariaria without NLP (with raw text only) = SVM with NLP > other methods with NLP

 

Searching for Recurring Anomalies

Problem Description

  • Clustering problem
  • Recurring Anomalies - "anomalies that may be described in different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system." (studied in this paper)
  • Given a set of N documents, where each document is a free text English document that describes a problem, an observation, a treatment, a study, or some other aspect of the vehicle, automatically identify a set of potential recurring anomalies in the reports (studied in this paper)
  • Related Problems: Topic Detection and Tracking

Exsiting Solution A

Proposed in this paper

  • Representing Text Documents in a Vector Space: Term-document matrix (or “Bag of Words” matrix)
  • Preprocessing: Reduce the number of dimensions (e.g., PCA)
  • Using exsiting methods - Several clustering methods for high-dimensional data are exmined

Exsiting Solution B

Proposed in this paper

  • A new mixture model: assume each document is generated by a distinct multinomial distribution
  1. Each recurring anomaly document is generated by: a general English language model (the choice of words), a topic model (type of the problem), and a document-specific information model (problem details)
  2. Solve multinomial distribution parameter estimation problem
  • Discover and cluster recurring anomalies based on the distance between different distributions