Active Learning and Language Models for Web Information Extraction

This project studies how to automatically extract large knowledge bases from the Web. We aim to develop techniques that can integrate the Web's tabular and textual data into a coherent knowledge base.

Questions we're interested in include:

Try our DEMO of Wikipedia-based Table Extraction, and associated data and other resources

Publications and associated resources:


This material is based upon work supported by the National Science Foundation under Grant Number 1016754. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.