Study in IRLAB

Summarizing_Answer_in_Non-Factoid_Community_Question_Answering

Summarizing Answers in Non-Factoid Community Question-Answering

create a model to summarize answer for non-factoid CQA by Ren Zhaochun and Maarten for WSDM 2017

Abstract

  • keywords: Community question-answering; Sparse coding; Short text processing; Document summarization
  • goal: summarizing answers in community question-answering(CQA)
  • challenges: requiring passages as answers, shortness, sparsity and diversity.
    • The shortness of answers in non-factoid CQA is an obstacle for document summarization methods in answer summarization.
    • The sparsity of syntactic and context information hinders the summarization process using traditional representation of short text, based on term frequency or latent topic modeling.
    • Summarization in non-factoid CQA is a recall-oriented problem, in which we need to recall as much relevant information as possible. However, the diverse topic distribution of answers in non-factoid CQA makes it difficult to generate a summary with high recall.
  • this work: propose a sparse coding-based summarization strategy that includes tree core ingredients:

    • short document expansion: extend each answer in a question answering thread to a more comprehensive represensive via entity linking and sentence ranking strategies.
    • sentence vectorization: each sentence is represented as a feature vector trained from a short text convolutional neural network model.
    • sparse-coding optimization framework: use the above sentence representations to estimate the saliency of candidate sentences and wikipedia sentence as reconstruction items.
  • Dataset: benchmark(I don’t know what is it)

  • metrics: ROUGE
  • Multi-document summarization
    • abstractive summaries and extractive summaries
    • extractive: cluster, graph-based ranking methods,
  • Community question-answer retrieval
    • most recent works focus on the task of passage retrieval

Method

  • Document expansion

    • Entity linking: Given a candidate answer $d$, the target of entity linking is to identify the entity $e$ from a knowledge base $\Delta$ that is the most likely reference of each sentence in $d$.
    • QA-based sentence ranking
  • Sentence representation

    • Convolutional neural networks
    • Feature vector generation
  • Answer Summarization: Sparse coding based framework