Processing math: 100%
Sparse Re­gres­sion via Range Count­ing
Aurélien Ooms
Uni­ver­site libre de Brux­elles

The sparse re­gres­sion prob­lem, also known as best sub­set se­lec­tion prob­lem, can be cast as fol­lows: Given a real d\times n ma­trix A, a vec­tor y in R^d, and an in­te­ger 2\leq k\leq d, find an affine com­bi­na­tion of at most k columns of A that is clos­est to y. We de­scribe a O(n^{k−1}\log^{d−k+2}n)-time ran­dom­ized (1+\epsilon)-ap­prox­i­ma­tion al­go­rithm for this prob­lem with d and \epsilon con­stant. This is the first al­go­rithm for this prob­lem run­ning in time o(n^k). Its run­ning time is sim­i­lar to the query time of a data struc­ture re­cently pro­posed by Har-Peled, Indyk, and Ma­habadi (ICALP'18), while not re­quir­ing any pre­pro­cess­ing. Up to poly­log­a­rith­mic fac­tors, it matches a con­di­tional lower bound re­ly­ing on a con­jec­ture about affine de­gen­er­acy test­ing. In the spe­cial case where k=d=O(1), we also pro­vide a sim­ple O(n^{d−1+\epsilon})-time de­ter­min­is­tic exact al­go­rithm. Fi­nally, we show how to adapt the ap­prox­i­ma­tion al­go­rithm for the sparse lin­ear re­gres­sion and sparse con­vex re­gres­sion prob­lems with the same run­ning time, up to poly­log­a­rith­mic fac­tors.