EOP

Excitement Open Platform for Recognizing Textual Entailment

View project on GitHub

Introduction

The Excitement Open Platform (EOP) is a generic architecture and a comprehensive implementation for textual inference in multiple languages. The platform includes state-of-art algorithms, a large number of knowledge resources, and facilities for experimenting and testing innovative approaches. The EOP is one of the main outcomes of the project EXCITEMENT - EXploring Customer Interactions through Textual EntailMENT. The Platform includes readily-available Recognizing Textual Entailment (RTE) technology, and a modular architecture for text preprocessing, entailment engines and several knowledge resources. The current EOP version covers three languages (i.e. English, German, Italian) and includes tools for creating new resources in other languages. The EOP has been designed to be used in several research use cases and its distribution package provides a number of utilities.


Recognizing Textual Entailment

Textual Entailment is a directional relation between text fragments. Given two text fragments, one named Text (T) and the other named Hypothesis (H), the Recognizing Textual Entailment task consists in recognizing whether the Hypothesis can be inferred from the Text. We use a graduated definition of entailment: T entails H (T ⇒ H) if, typically, a human reading T would infer that H is most likely true. The following is an example of positive entailment:

  • Text: If you help the needy, God will reward you.
  • Hypothesis: Giving money to a poor man has good consequences.

More on Recognizing Textual Entailment can be found at the ACLwiki Textual Entailment Portal.


Architecture

EOP Architecture The EOP takes T-H pairs as input and the output is an entailment judgement, represented by "Entailment" if T entails H, or "NonEntailment" if the relation does not hold. The EOP architecture is based on the concept of modularization with pluggable and replaceable components to enable extension and customization.

The Linguistic Analysis Pipeline (LAP) is a collection of annotation components for Natural Language Processing (NLP) where component integration is based on the Apache UIMA framework. It enables interoperability among components while ensuring language independence.

The Entailment Decision Algorithm (EDA) computes an entailment decision for a given T-H pair, and can use components that provide standardized algorithms or knowledge resources. Currently, the EOP ships with three EDAs each following a different approach: transformation-based, edit-distance based, and classification based.

Knowledge Resources are crucial to recognize cases where T and H use different textual expressions (words, phrases) while preserving entailment (e.g., home --> house, Hawaii --> America, born in --> citizen of). The EOP includes a wide range of knowledge resources, including lexical and syntactic resources, where some of them are grabbed from manual resources, like dictionaries, while others are learned automatically.


Use Cases

  • Applied Textual Entailment: users not interested in details of RTE but in NLP tasks in which textual entailment can take over part of or all of the semantic processing, such as Question Answering or Intelligent Tutoring.
  • Textual Entailment Development: researchers interested in RTE itself, for example with the goal of developing novel algorithms for detecting entailment.
  • Lexical Semantics Evaluation: researchers whose primary interest is in (lexical) semantics. They want to integrate knowledge resources into the EOP platform and measure their impact on deciding textual entailment.
  • Educational Use: EOP as an educational tool to support academic courses and projects on RTE and inference more generally.

Distribution

  • Communication channels: users/developers mailing list and issue tracking system.
  • Build Automation Tool: EOP as a Maven multi-modules project, with all modules sharing the same Maven standard structure, making it easier to find files in the project once one is used to Maven.
  • Version Control System: GitHub for code and documentation storage, development, and issue tracking.
  • Continuous Integration: Jenkins for Continuous Integration, a software development practice where developers of a team integrate their work frequently (e.g., daily).
  • Results Archive: as a new feature for community building; the EOP users share their experiments and results in a dedicated repository.
  • Licence: General Public License (GPL) Version 3.

References

B. Magnini, R. Zanoli, I. Dagan, K. Eichler, G. Neumann, T.-Gil. Noh, S. Pado, A. Stern, O. Levy
The Excitement Open Platform for Textual Inferences. In proceedings of ACL demo session, June 2014.

S. Pado, T.-G. Noh, A. Stern, R. Wang, R. Zanoli Design and Realization of a Modular Architecture for Textual Entailment. Natural Language Engineering. Cambridge University Press, 2014.