

(appears at VLDB 2011)įull 15 page version with all proofs: (Version Sept 2010)Īlexandra Meliou, Wolfgang Gatterbauer, Joseph Halpern, Christoph Koch, Katherine F. The Complexity of Causality and Responsibility for Query Answers and non-AnswersĪlexandra Meliou, Wolfgang Gatterbauer, Katherine F. Wolfgang Gatterbauer, Alexandra Meliou, Dan Suciu
#Causality full#
Tracing Data Errors with View-Conditioned CausalityĪlexandra Meliou, Wolfgang Gatterbauer, Suman Nath, and Dan Suciu.īring Provenance to its Full Potential Using Causal ReasoningĪlexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu. (appears at VLDB 2016)Ĭausality and Explanations in Databases (Tutorial)Īlexandra Meliou, Sudeepa Roy, and Dan Suciu. System could inhibit the reading from that sensor and therefore improve theĪ Characterization of the Complexity of Resilience and Responsibility for Self-join-free Conjunctive QueriesĬibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou Is to actually identify the responsible sensor(s). The application can oftenĭetect such errors from user feedback or based on the user's subsequentĪctions and reactions to the provided recommendations. As aĬonsequence, an inferred activity may be wrong. While driving, then the light sensor's reading is incorrect). Inhibited (e.g., if the user places the cellphone in the glove compartment The GPS may miscalculate the current location), or some sensors may become

Sensor data are a common occurrence: sensors have innate imprecisions (e.g., Sensors, the target data (output) are the activities. This is an example of data transformation: the source data (input) are the Time, the application will suggest restaurants within walking distance. For example, if the user is away from their car around lunch Using the knowledge of the user's currentĪctivities allows the system to serve the user with useful targeted walking, driving, working, being with family, orīeing in a business meeting). Based on these sensors, a set of classifiers can predict the owner'sĬurrent activities (e.g. GPS, accelerometer, light, and cell tower "post-factum" data cleaning: while in standard data cleaning one correctsĮrrors before the data is transformed and integrated, in our setting theĮrrors are detected only after the data has been transformed.Įxample (Recommendation System): Consider a new generation smart phone. It is critical that the error be traced and corrected in the source data,īecause once an error is identified, one can prevent it from propagating to Of the many input tuples that contributed to the incorrect output is faulty. Some of its attribute values are erroneous she would like to find out which Item in a target data instance is incorrect: the tuple should not be there, or Applications in error diagnosis and cleaningĭata transformations from a source to a target dataset are ubiquitous todayĪnd can be found in data integration, data exchange, and ETL tools. For an intuitive introduction and several motivating examples see the Data Bulletin article. Starting from the very basic functionality of justifying the presence orĪbsence of results for a given query, causality-enabled databases can find many practical applications. To provide explanations for their observations. This will allow databases to model causal dependencies, and users to issue queries that can interpret them


The goal of this project is to extend the capabilities of current database systems by incorporating to them causal Patterns but they are not sufficient to draw conclusions, as correlation does not necessarily imply causation. Mining techniques can infer statistically significant data Systems, which offer no specific support for such queries. Through a common underlying theme: understanding causalĬausal relationships cannot be explicitly modeled in current database The response to questions should be, all of them seem to be linked Query results, such as why or where provenance, and very recently,Įxplanations for non-answers. That addresses these or similar questions is mainly work on lineage of Observations: "What caused my personalized newsfeed to contain more than 10 items related to volcanos?", "WhyĬan't I find any flights with my search criteria?". When analyzing data sets, users are often interested in the causes of their Explaining Query Answers with Causal Relationships
