Generalized information adaptation: Reducing annotation effort in data analysis

Even in the era of Big Data, the need for manually annotated data remains a critical bottleneck in developing automated prediction systems. A key strategy for reducing the reliance on human annotation is to exploit labeled data that already exists in other related domains. Generalized information adaptation provides mechanisms for such cross-domain data sharing, aiming to reduce human effort and improve the autonomy of system development. In this talk, I will first introduce the problem of generalized information adaptation, then discuss a particularly challenging instance — cross-lingual learning — where annotated data in one language is used to improve the training of a prediction model in a different language. To achieve meaningful transfer between languages with disjoint feature spaces, I will present a matrix completion based learning method that supports the automatic transfer of statistical features. The learning problem is formulated as a convex optimization that admits an efficient global training algorithm. The approach is then extended to semi-supervised learning by incorporating additional label information in the matrix completion process. The results of these methods will be demonstrated in cross-lingual learning problems where the source and target domains are expressed in different languages with no feature overlap.