task formalization level
三元组本身的知识表达能力有限,比如,Musk case里的hierarchical dependency,时间、地点、职位、人物需要在一个更高维度的空间表达algorithm level
输入:a raw sentence with two marked mentions 输出:whether a relation holds between the two mentions hard for neural models to capture all the lexical, semantic and syntactic cues in this formalization (1) entities are far away; (2) one entity is involved in** multiple triplets;** or (3) relation spans have overlaps related work Extracting Entities and Relations pipelined approach 优点:**flexibility **of integrating different data sources and learning algorithms缺点:suffer significantly from error propagation joint approach through various dependencies constraints solved by integer linear programming card-pyramid parsingglobal probabilistic graphical modelsstructured perceptron with efficient beamsearch table-filling approach search orders in decoding and global features shared parameters,end-to-end approach that extract entities and their relations using neural network models neural tagging model,multi-class classification model based on tree LSTMs multi-level attention CNNsseq2seq models to generate entity-relation triples reinforcement learning or Minimum Risk Training a global loss function to jointly train the two models under the framework work of Minimum Risk Traininghierarchical reinforcement learning Machine Reading Comprehension, predicting answer spans given context 主要做的是 extract text spans in passages given queries一种思路可以简化成 two multi-class classification tasks另一种思路,对于multi-passage MRC, directly concatenating passages, first rank the passages and then run single-passage MRC on the selected passage其他有用的:Pretraining methods like BERT or Elmo趋势:a tendency of casting non-QA NLP tasks as QA tasks 具体的比如 BiDAF、QANet 本论文的工作inspiration 来源
identifying the relation between two predefined entities and the authors formalize the task of relation extraction as a single-turn QA task Levy et al. (2017). Levy et al. (2017) and McCann et al. (2018)Idea
model hierarchical tag dependency in multi-turn QA, identifying answer spans from the context each entity type and relation type is characterized by a question answering template, and entities and relations are extracted by answering template questionsquestion query encodesjointly modeling entity and relationexploit the well developed machine reading comprehension (MRC) modelsmulti-step reasoning to construct entity dependenciesadvantages
capture the** hierarchical dependency of tags**, progressively obtain the entities we need for the next turn , closely akin to** the multi-turn slot filling dialogue system**the question query encodes important** prior information** for the relation class we want to identifythe QA framework provides a natural way to simultaneously extract entities and relations: most MRC models support outputting special NONE tokens, indicating that there is no answer to the questiondataset
ACE04, ACE05 and the CoNLL04 corpora a newly developed dataset RESUME in Chineseextract biographical information of individuals from raw texts. The construction of structural knowledge base from RESUME requires four or five turns of QA * 最大的特点:one person can work for **different **companies during **different **periods of time and that one person can hold **different **positions in **different **periods of time for the **same **company
model
分解成了两个子任务:a multi-answer task for head-entity extraction + a single-answer task for joint relation and tail-entity extraction 第一阶段:head-entity extraction,extract this starting entity, we** transform each entity type to a question** using EntityQuesTemplates 这个阶段抽取到的不一定就是head entities 第二阶段:The relation and the tail-entity extraction,定义了relations chain用于multi-turn QA,因为一些的抽取取决于其他的抽取Generating Questions using Templates
type-specificnatural language questions or pseudo-questionsExtracting Answer Spans via MRC
backbone:BERT,基于多轮问答的问题对Traditional MRC models做了调整:predict a BMEO (beginning, inside, ending and outside) label Training and Test L = ( 1 − λ ) L ( head-entity ) + λ L ( tail-entity, rel ) \mathcal{L} = ( 1 - \lambda ) \mathcal{L} ( \text { head-entity } ) + \lambda \mathcal{L} ( \text { tail-entity, rel} ) L=(1−λ)L( head-entity )+λL( tail-entity, rel)训练的时候两个共享参数,测试的时候head-entities and tail-entities are extracted separately,λ 控制两个子任务的 tradeoffReinforcement Learning
一个turn中抽取的答案还会影响downstream turns,也影响later accuracies由于multi-turn dialogue generation的结果比较好,所以打算也用reinforcement learning (Mrkˇsi´c et al., 2015; Li et al., 2016; Wen et al., 2016)action:selecting a text span in each turnpolicy:probability of selecting a certain span given the question and the context p ( y ( w 1 , … , w n ) = answer ∣ question, s ) = p ( w 1 = B ) × p ( w n = E ) ∏ i ∈ [ 2 , n − 1 ] p ( w i = M ) \left. \begin{array} { l } { p ( y ( w _ { 1 } , \ldots , w _ { n } ) = \text { answer } | \text { question, } s ) } \\ { = p ( w _ { 1 } = \mathrm{B} ) \times p ( w _ { n } = \mathrm{E} ) \prod _ { i \in [ 2 , n - 1 ] } p ( w _ { i } = \mathrm{M} ) } \end{array} \right. p(y(w1,…,wn)= answer ∣ question, s)=p(w1=B)×p(wn=E)∏i∈[2,n−1]p(wi=M)Reward:特定句子,用正确抽取的triples作为奖励 ,maximizes the expected reward E π [ R ( w ) ] E _ { \pi } [ R ( w ) ] Eπ[R(w)],通过sampling from the policy π π π来近似gradient的计算:likelihood ratio: $\nabla E ( \theta ) \approx [ R ( w ) - b ] \nabla \log \pi ( y ( w ) | \text { question } s ) ) $ b 是 baseline value(所有之前奖励的平均), 每轮答对就奖励+1,final reward是所有轮次积累的奖励policy networks initialization:pre-trained head-entity and tail-entity extraction modelexperience replay strategy:for each batch, half of the examples are simulated and the other half is randomly selected from previously generated examples.strategy of curriculum learning 用在了 RESUME dataset 上,gradually increase the number of turns from 2 to 4 at trainingExperimental Results( SOTA results)
指标:micro-F1 scores, precision and recall 自己搞的数据集RESUME上的结果 先确定一个baseline:tagging+relation,entity+dependencyentities 部分用BERT tagging models,relations部分用 CNN to representations output by BERT transformers这个任务akin to a dependency parsing task at the tag-level rather than the word-level具体做法:通过BERT tagging model给每个word分配tagging labels, 然后调整SOTA dependency parsing model Biaffine来construct dependencies between tags(jointly trained) 比较常用的ACE04, ACE05 and CoNLL04上的结果Ablation Studies
Effect of Question Generation Strategy:natural language questions会比pseudo-questions更好,因为提供的是more fine-grained semantic information
Effect of Joint Training: λ 间隔0.1地测了10个数据,其中,entity-extraction并不是在 λ = 0 的时候最好(说明relation extraction的部分可以提升entity extraction的效果)
Case Study:和SOTA MRT model对比,可以识别出相隔比较远的实体; 当句子里包含两对相同关系的时候,也可以识别出来
后续发展空间
could easilty integrate reinforcement learning (just as in multi-turn dialog systems)