Research Resources

This is a repository for research resources that might be of interest to academics and practitioners who use discrete choice methods. Below, you can find access to datasets that might be helpful for your analysis, code in various programming languages for the estimation of different discrete choice models, and working papers. The Institute is deeply committed to the open data and open source movement. The objective of this repository is to encourage opportunities for further analysis, replication, verification and refinement. 

Working Papers

Seasonality Effect on US Household Demand for Different Beef Cuts

Ardeshiri and Swait

Abstract: Australia is one the largest exporters of beef and beef products to the United States (Haley & Jones, 2017). A better understanding of the American demand for beef is important since Australia is facing strong competition from Canada and New Zealand in the beef market. We applied a discrete choice experiment to investigate 946 American consumer preferences and willingness-to-pay (WTP) for different beef products. Consumers were presented with a novel experiment in which they indicated “how many” they would purchase for ground, diced, roast, and six cuts of steaks (sirloin, tenderloin, flank, flap, New York and cowboy/rib-eye).

The results from a scaled adjusted ordered logit model showed that after price, cues related to safety option purchases such as certified logo, type of packaging, antibiotic free and organic products play a stronger influential role on American consumers’ decision making (especially in summer where the opportunities for foodborne bacteria to thrive in warm weather is higher) compared to other beef attributes.

Furthermore, on average US consumers purchase diced and roast products more often in winter “as a slow cooked season” than in summer whereas New York strip and flank steak are more popular in summer as “the grilling season” than in winter.

Finally, this study provides managerial and policy implication and recommendations to help Australian exporters to better understand US consumer preferences for beef through an understanding of seasonal effects on demand for this good.

Download Here

Flexible mixture - Amount Models for Business and Industry using Gaussian Processes

Ruseckaite, Fok and Goos

Abstract: Many products and services can be described as mixtures of ingredients whose proportions sum to one. Specialized models have been developed for linking the mixture proportions to outcome variables, such as preference, quality and liking. In many scenarios, only the mixture proportions matter for the outcome variable. In such cases, mixture models suffice. In other scenarios, the total amount of the mixture matters as well. In these cases, one needs mixture-amount models. As an example, consider advertisers who have to decide on the advertising media mix (e.g. 30% of the expenditures on TV advertising, 10% on radio and 60% on online advertising) as well as on the total budget of the entire campaign. To model mixture-amount data, the current strategy is to express the response in terms of the mixture proportions and specify mixture parameters as parametric functions of the amount. However, specifying the functional form for these parameters may not be straightforward, and using a flexible functional form usually comes at the cost of a large number of parameters. In this paper, we present a new modeling approach which is flexible but parsimonious in the number of parameters. The model is based on so-called Gaussian processes and avoids the necessity to a-priori specify the shape of the dependence of the mixture parameters on the amount. We show that our model encompasses two commonly used model specifications as extreme cases. Finally, we demonstrate the model’s added value when compared to standard models for mixture-amount data. We consider two applications. The first one deals with the reaction of mice to mixtures of hormones applied in different amounts. The second one concerns the recognition of advertising campaigns. The mixture here is the particular media mix (TV and magazine advertising) used for a campaign. As the total amount variable, we consider the total advertising campaign exposure.

Download Here

Modeling and Forecasting the Evolution of Preferences over Time: A Hidden Markov Model of Travel Behavior

El Zarwi, Vij and Walker

Abstract: Preferences, as denoted by taste parameters and consideration sets, may evolve over time in response to changes in demographic and situational variables, psychological, sociological and biological constructs,and available alternatives and their attributes. However, existing representations typically overlook the influence of past experiences on present preferences. This study develops a hidden Markov model with a discrete choice kernel for modeling and forecasting the evolution of individual preferences over time. The hidden states denote different latent preferences, and the evolutionary path is hypothesized to be a first order Markov process such that an individual’s preferences during a particular time period are dependent on their preferences during the previous time period. The framework is applied to study the evolution of modal preferences, or modality styles, over time, in response to a major change in the public transportation system. Empirical findings reveal two complementary narratives. At the population level, there are significant shifts in the distribution of individuals across modality styles before and after the change in the system, but the distribution is relatively stable in the periods after the change. At the individual level, greater instability in preferences is observed, much after the change, despite accounting for the inertial influence of past preferences. A comparison between the proposed dynamic frameworkand comparable static frameworks reveals corresponding differences in aggregate forecasts for different policy scenarios, demonstrating the value of the proposed framework for both individual and population level policy analysis.

Download Here

Moving past random taste heterogeneity in discrete choice models: Multivariate nonparametric finite mixture distributions

Vij, A.

Abstract: This study develops an expectation maximization algorithm for the estimation of mixed logit models with multivariate nonparametric finite mixture distributions, where the support of the distribution is specified as a high-dimensional grid over the coefficient space, with equal or unequal intervals between successive points along the same dimension, and the location of each point on the grid and the probability mass at that point are model parameters that need to be estimated. The framework does not require the analyst to specify the shape of the distribution prior to model estimation, but can approximate any multivariate probability distribution function to any arbitrary degree of accuracy. The estimation algorithm can feasibly estimate behaviorally meaningful models with multivariate distributions over high-dimensional coefficient spaces with hundreds of mass points. Multiple synthetic datasets and a case study on travel mode choice behavior are used to demonstrate the value of the model framework and estimation algorithm. The literature on discrete choice models is replete with ways to incorporate random taste heterogeneity. By proposing a fully flexible and computationally tractable approach, this study aims to bring to a close the question of how best to include random taste heterogeneity within existing representations of decision-making.

Download Here


California Household Travel Survey 2012:
Tour mode choice data from the San Francisco Bay Area

This data was originally collected as part of the California Household Travel Survey (CHTS) in the year 2012. Individuals belonging to sampled households were asked to report their complete activity diary data over an observation period of one day, including which activities were conducted where, when, for how long, with whom and using what mode of travel. More information on the raw data can be found in NuStats, LLC (2013).

The data included here corresponds to individuals from the subset of households located in the nine-county San Francisco Bay Area. The raw trip data was processed into home-based tours that can be used for the purpose of tour-based travel mode choice analysis. The resulting dataset includes 27,054 tours made by 17,717 individuals from 8,228 households.

For each tour, six possible travel mode alternatives are defined: private vehicle, private transit, walk to public transit, drive to public transit, bike, and walk. Private vehicle refers to cases where the individual used a motorized vehicle owned by themselves (or someone they know) as a driver or a passenger. Private transit includes the use of travel modes such as taxis, Uber, carshare, rental cars and private shuttles. Walk to public transit captures all cases in which an individual only used non-motorized travel modes to access public transit, and drive to public transit captures all cases in which a motorized travel mode was used to access public transit.

The level-of-service attributes, namely travel times and costs, for each of the six travel modes for each tour are determined using network skims from the SF MTC for 2010, generated using version 3 of their travel demand model. We are unable to decompose travel time into its constituent elements, such as in-vehicle time and waiting time, as this information was unavailable at the time of processing. Travel costs are in 2000 US dollars.

The download link below contains five files: the processed data file, the Python script used to process the raw data, an iPython notebook included as an example on how to use the data file for analysis, the data dictionary for the raw data and a readme file.

A subset of this data was originally used by Vij et al. (2017) for understanding modal preference shifts in the San Francisco Bay Area over time. For more details, please refer to the original study. And if you have any questions, feel free to contact

Download Here


Nustats, LLC, 2013. 2010–2012 California Household Travel Survey Final Report.

Vij, A., Gorripaty, S., & Walker, J. L. (2017). From trend spotting to trend’splaining: Understanding modal preference shifts in the San Francisco Bay Area. Transportation Research Part A: Policy and Practice95, 238-258.

Estimation Code

Python estimation code for flexible Latent Class Choice Models (LCCMs)

LCCM is a Python package for estimating latent class choice models using the Expectation Maximization (EM) algorithm to maximize the likelihood function. The package was developed by Feras El Zarwi, a PhD candidate at the University of California, Berkeley, with assistance from Akshay Vij from the Institute for Choice. The package offers significant improvement over other estimation packages, some of which are listed below:

  • Supports datasets with multiple observations per decision-maker
  • Supports datasets where the choice set differs across observations
  • Supports model specifications where the coefficient for a given variable may be generic (same coefficient across all alternatives) or alternative specific (coefficients varying across all alternatives or subsets of alternatives) in each latent class
  • Accounts for sampling weights in case the data you are working with is choice-based i.e. Weighted Exogenous Sample Maximum Likelihood (WESML) from (Ben-Akiva and Lerman, 1983) to yield consistent estimates
  • Constrains the choice set across latent classes whereby each latent class can have its own subset of alternatives in the respective choice set
  • Constrains the availability of latent classes to all individuals in the sample whereby it might be the case that a certain latent class or set of latent classes are unavailable to certain decision-makers

For more information about the estimation code, see El Zarwi (2017). If the package is useful in your research or work, please cite the dissertation reference before and the package itself. For any questions, please contact Feras at


El Zarwi, Feras. "Modeling and Forecasting the Impact of Major Technological and Infrastructural Changes on Travel Demand", PhD Dissertation, 2017, University of California at Berkeley.

Areas of study and research

+ Click to minimise