Large-scale health surveys suitable for addiction studies furnish selfreported data that consequently suffer from a form of measurement error called heaping which statisticians have been grappling with for decades. Also known as digit preference, the aberration is often characterized by spikes at multiples of 10 or 5 upon rounding. To date, methods and software for heaped (and seeped) data have been largely wanting. Identifying three generic problems for simple addiction studies, we solve them by a newly developed technique called Generally Altered, Inflated, Truncated and Deflated (GAITD) regression for counts applied to a recent National Health and Nutrition Examination Survey data set. In conjunction, we propose the class of Doubly constrained Reduced-Rank VGLMs whereby the reduced-rank regression is afforded structure by way of linear constraints—this allows further simplification of the dimension reduction. We determine the distribution of smoking initiation age (SIA) and its association with tobacco consumption and smoking duration, e.g., is a lower SIA associated with higher tobacco consumption later in life? Is higher SIA associated with shorter smoking duration among quitters? Together, GAITD regression and DRR-VGLMs hold promise for heaped and seeped data.

Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models, in smoking studies

Frigau, Luca;
2025-01-01

Abstract

Large-scale health surveys suitable for addiction studies furnish selfreported data that consequently suffer from a form of measurement error called heaping which statisticians have been grappling with for decades. Also known as digit preference, the aberration is often characterized by spikes at multiples of 10 or 5 upon rounding. To date, methods and software for heaped (and seeped) data have been largely wanting. Identifying three generic problems for simple addiction studies, we solve them by a newly developed technique called Generally Altered, Inflated, Truncated and Deflated (GAITD) regression for counts applied to a recent National Health and Nutrition Examination Survey data set. In conjunction, we propose the class of Doubly constrained Reduced-Rank VGLMs whereby the reduced-rank regression is afforded structure by way of linear constraints—this allows further simplification of the dimension reduction. We determine the distribution of smoking initiation age (SIA) and its association with tobacco consumption and smoking duration, e.g., is a lower SIA associated with higher tobacco consumption later in life? Is higher SIA associated with shorter smoking duration among quitters? Together, GAITD regression and DRR-VGLMs hold promise for heaped and seeped data.
2025
Addiction science; count responses; finite mixture distribution; generally altered; inflated; multinomial logit model; reduced rank regression; truncated and deflated regression
File in questo prodotto:
File Dimensione Formato  
Heaping and seeping, GAITD regression and doubly constrained reduced-rank vector generalized linear models, in smoking studies.pdf

accesso aperto

Tipologia: versione post-print (AAM)
Dimensione 748.5 kB
Formato Adobe PDF
748.5 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/460641
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact