Travel demand models are usually estimated using cross-sectional data. Although the use of panel data has recently increased in many areas, there are still many aspects that have not been fully analyzed. Some examples of unexplored topics are: the optimal length of panel surveys and the resulting issue of how to model panel data correctly in the presence of repeated observations (for example, several trips per week, by people in a panel with waves every six months) and whether, and to which extent, this affects the efficiency of the estimated parameters and their capability to replicate the true situation. In this paper we analyse this issue and test the effect of including journeys made, with the same characteristics, several times in a week. A broad variety of models accounting for fixed parameters but also for random heterogeneity and correlation among individuals were estimated using real and synthetic data. The real data comes from the Santiago Panel (2006-2008), while the synthetic data were appropriately generated to examine the same problem in a controlled experiment. Our results show that having more observations per individual increases the probability of capturing more effects (i.e. different types of heterocedasticity), but having identical observations in a data panel reduces the capability to reproduce the true phenomenon. Consequently, the definition of panel survey length demands us to consider the implicit level of routine (i.e. the proportion of identical observations) in it.
On the treatment of the repeated observations in panel data: efficiency of Mixed Logit parameter estimates
CHERCHI, ELISABETTA;
2011-01-01
Abstract
Travel demand models are usually estimated using cross-sectional data. Although the use of panel data has recently increased in many areas, there are still many aspects that have not been fully analyzed. Some examples of unexplored topics are: the optimal length of panel surveys and the resulting issue of how to model panel data correctly in the presence of repeated observations (for example, several trips per week, by people in a panel with waves every six months) and whether, and to which extent, this affects the efficiency of the estimated parameters and their capability to replicate the true situation. In this paper we analyse this issue and test the effect of including journeys made, with the same characteristics, several times in a week. A broad variety of models accounting for fixed parameters but also for random heterogeneity and correlation among individuals were estimated using real and synthetic data. The real data comes from the Santiago Panel (2006-2008), while the synthetic data were appropriately generated to examine the same problem in a controlled experiment. Our results show that having more observations per individual increases the probability of capturing more effects (i.e. different types of heterocedasticity), but having identical observations in a data panel reduces the capability to reproduce the true phenomenon. Consequently, the definition of panel survey length demands us to consider the implicit level of routine (i.e. the proportion of identical observations) in it.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.