Summarizing Answers in Non-Factoid Community Question-Answering
create a model to summarize answer for non-factoid CQA by Ren Zhaochun and Maarten for WSDM 2017
Abstract
- keywords: Community question-answering; Sparse coding; Short text processing; Document summarization
- goal: summarizing answers in community question-answering(CQA)
- challenges: requiring passages as answers, shortness, sparsity and diversity.
- The shortness of answers in non-factoid CQA is an obstacle for document summarization methods in answer summarization.
- The sparsity of syntactic and context information hinders the summarization process using traditional representation of short text, based on term frequency or latent topic modeling.
- Summarization in non-factoid CQA is a recall-oriented problem, in which we need to recall as much relevant information as possible. However, the diverse topic distribution of answers in non-factoid CQA makes it difficult to generate a summary with high recall.
this work: propose a sparse coding-based summarization strategy that includes tree core ingredients:
- short document expansion: extend each answer in a question answering thread to a more comprehensive represensive via entity linking and sentence ranking strategies.
- sentence vectorization: each sentence is represented as a feature vector trained from a short text convolutional neural network model.
- sparse-coding optimization framework: use the above sentence representations to estimate the saliency of candidate sentences and wikipedia sentence as reconstruction items.
Dataset: benchmark(I don’t know what is it)
- metrics: ROUGE
Related work
- Multi-document summarization
- abstractive summaries and extractive summaries
- extractive: cluster, graph-based ranking methods,
- Community question-answer retrieval
- most recent works focus on the task of passage retrieval
Method
Document expansion
- Entity linking: Given a candidate answer $d$, the target of entity linking is to identify the entity $e$ from a knowledge base $\Delta$ that is the most likely reference of each sentence in $d$.
- QA-based sentence ranking
Sentence representation
- Convolutional neural networks
- Feature vector generation
Answer Summarization: Sparse coding based framework