Large-scale health surveys suitable for addiction studies furnish selfreported data that consequently suffer from a form of measurement error called heaping which statisticians have been grappling with for decades. Also known as digit preference, the aberration is often characterized by spikes at multiples of 10 or 5 upon rounding. To date, methods and software for heaped (and seeped) data have been largely wanting. Identifying three generic problems for simple addiction studies, we solve them by a newly developed technique called Generally Altered, Inflated, Truncated and Deflated (GAITD) regression for counts applied to a recent National Health and Nutrition Examination Survey data set. In conjunction, we propose the class of Doubly constrained Reduced-Rank VGLMs whereby the reduced-rank regression is afforded structure by way of linear constraints—this allows further simplification of the dimension reduction. We determine the distribution of smoking initiation age (SIA) and its association with tobacco consumption and smoking duration, e.g., is a lower SIA associated with higher tobacco consumption later in life? Is higher SIA associated with shorter smoking duration among quitters? Together, GAITD regression and DRR-VGLMs hold promise for heaped and seeped data.
Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models, in smoking studies
Frigau, Luca;
2025-01-01
Abstract
Large-scale health surveys suitable for addiction studies furnish selfreported data that consequently suffer from a form of measurement error called heaping which statisticians have been grappling with for decades. Also known as digit preference, the aberration is often characterized by spikes at multiples of 10 or 5 upon rounding. To date, methods and software for heaped (and seeped) data have been largely wanting. Identifying three generic problems for simple addiction studies, we solve them by a newly developed technique called Generally Altered, Inflated, Truncated and Deflated (GAITD) regression for counts applied to a recent National Health and Nutrition Examination Survey data set. In conjunction, we propose the class of Doubly constrained Reduced-Rank VGLMs whereby the reduced-rank regression is afforded structure by way of linear constraints—this allows further simplification of the dimension reduction. We determine the distribution of smoking initiation age (SIA) and its association with tobacco consumption and smoking duration, e.g., is a lower SIA associated with higher tobacco consumption later in life? Is higher SIA associated with shorter smoking duration among quitters? Together, GAITD regression and DRR-VGLMs hold promise for heaped and seeped data.| File | Dimensione | Formato | |
|---|---|---|---|
|
Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models, in smoking studies.pdf
accesso aperto
Tipologia:
versione post-print (AAM)
Dimensione
748.5 kB
Formato
Adobe PDF
|
748.5 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


