The CAC Score Study
Part 1 · Part 2 (you are here) · Part 3 · Part 4 · CAC Score methodology →

Why the sampling design is the foundation of the CAC Score

When we publish a CAC Score next to a casino on cacpaloalto.org, that number out of 100 is only as trustworthy as the way we gathered the opinions behind it. This second part of our four-part methodology series is about exactly that gathering. It is the paper where we open the hood and show you how we sampled California, how we decided who counted as a respondent, what we asked them, and how we are entitled to claim a margin of error of plus or minus 1.51 percent at the 95 percent confidence level. If the first part of this series explained why we built the CAC Score and what its eight pillars mean, this part explains how we earned the right to put a precise number on each of those pillars.

We want to be plain about our motivation. There is a great deal of casino content on the internet that presents ratings as if they were facts handed down from on high, with no description of where the numbers came from. We started CA Casinos, Palo Alto Organization in Palo Alto with a small player panel precisely because we were tired of that. We wanted a rating that a California player could interrogate. So this paper is written in the spirit of a methods section in a peer-reviewed journal. We describe the target population, the sampling frame, the eligibility and consent procedures, the survey instrument, the random and snowball sampling methods, the sample-size mathematics, and the validity, reliability and limitations of the whole exercise. Where a design choice rests on an established method, we cite the source so you can check it yourself.

Everything in the CAC Score is read through a California lens. Only legally eligible California residents aged 21 or over count toward the survey, and the regional and demographic weights are calibrated to California, not to the United States as a whole. A casino that pays fast and treats Texas players well but blocks or slows California withdrawals will not score well with us, because our respondents are Californians.

Throughout this paper we use the word "we" to mean the research team at CA Casinos, Palo Alto Organization. The total verified sample described here is N = 4,217 California residents aged 21 and over who play or intend to play online casino games. That single sample sits behind every CAC Score we publish, and behind every figure in this series. We will return to that number many times, because in survey research the sample is not a detail. It is the thing that makes inference possible.

The target population and the sampling frame

Every sample is a stand-in for a larger population, and the first job of any honest methodology is to define that population precisely. Our target population is California residents aged 21 or over who currently play online casino games or who intend to start. That definition has three parts, and each one matters.

First, residency. We are interested in Californians, because the CAC Score is a California rating. A casino that is wonderful for players in New Jersey but unavailable, throttled or unfriendly to a Californian is not a good California casino, and our population definition forces that distinction. Second, age. The legal threshold we apply is 21 or over. We do not sample anyone younger, and we verify age and residency before a respondent is admitted, which we describe in detail below. Third, the behavior. We are interested in people who play or who genuinely intend to play, not in the general adult public. Someone who has never gambled and has no intention of starting cannot tell us how quickly a withdrawal arrived or whether wagering requirements were explained clearly, so they fall outside the population we care about.

The sampling frame is the practical list or mechanism from which we actually draw respondents. In an ideal world the frame would be a perfect register of every eligible Californian online-casino player. No such register exists, and pretending one does would be the first dishonest move in a methodology. Instead we built a frame out of the channels through which California players can actually be reached: panel partners with verified California members, opt-in player communities, and direct recruitment across the state. We treat the gap between the true population and the reachable frame as a known source of bias rather than something to hide, and we discuss it openly in the limitations section. This is the same coverage-error thinking that Dillman and colleagues set out in their work on mixed-mode survey design, and it is why we recruit through several channels rather than one, so that no single channel's blind spots dominate the sample.

We estimate that our verified sample of 4,217 respondents represents a sampling fraction of roughly 1 percent of the estimated active California online-casino audience. That is a meaningful slice. It is large enough to support tight confidence intervals, as the mathematics later in this paper will show, and it is far larger than the convenience samples of a few hundred people that often sit behind casino "studies" elsewhere. A 1 percent sampling fraction also has a pleasant side effect for the statistics, because it keeps the finite population correction modest, a point we return to when we work through the sample-size formula.

From Palo Alto panel to a statewide field study

The project did not begin at 4,217 respondents. It began in Palo Alto with a small player panel, a group of local California players who were willing to answer detailed questions about their experiences. That panel taught us which questions actually discriminated between good and bad casinos, and which questions players found ambiguous or annoying. We then expanded to field sites across California, recruiting in every region, until the achieved sample mirrored the state. That California-wide, Palo-Alto-rooted origin is the reason the organization is named CA Casinos, Palo Alto Organization, and the reason the rating is the CAC Score. The pilot panel is not part of the 4,217; it was a separate, earlier phase used to refine the instrument. The numbers reported in this series come from the full statewide field study only.

Eligibility, consent and voluntary participation

Before we describe how respondents were chosen, we want to describe who was allowed to be a respondent at all and on what terms, because the ethics of recruitment are not separable from the quality of the data. A sample that includes people who should not have been in it, or who did not understand what they were agreeing to, is a contaminated sample no matter how clever the sampling mathematics.

Legal eligibility: California residents aged 21 or over

Every single respondent in the study was a legally eligible participant. In concrete terms that means each respondent was a California resident aged 21 or over who was able to lawfully take part. We did not rely on an honesty checkbox alone. Age and residency were verified before a person was included in the sample. If verification failed, the person was excluded and did not contribute to any number in this series. This is not a cosmetic rule. Because the entire purpose of the CAC Score is to describe what legally eligible California players experience, admitting an under-age respondent or a non-Californian would not merely be unethical, it would also corrupt the very quantity we are trying to measure. The eligibility rule and the measurement goal are the same rule seen from two angles.

Voluntary participation and informed consent

Participation was entirely voluntary from start to finish. No respondent was coerced, and no respondent was penalized for declining. Before answering anything, each respondent gave informed consent. They were told what the study was about, what kinds of questions they would be asked, roughly how long it would take, that their answers would be anonymized, and that the results would be published only in aggregate. Crucially, they were told that they could decline any individual question and that they could withdraw at any point without giving a reason and without any penalty. A respondent who started the survey and then stopped halfway was free to do so, and their partial data was handled according to the consent they had given.

We treat the right to decline and the right to withdraw as real rights, not as fine print. In practice this means our instrument was designed so that almost no question was forced. A player who did not want to disclose their gender, for instance, could leave it blank, which is part of why our gender breakdown includes an "other or undisclosed" category. Honoring the right to skip means we sometimes have slightly different denominators for different questions, and we account for that in analysis rather than pretending every respondent answered every item.

Anonymization and aggregate reporting

All responses were anonymized. We publish no personally identifying information about any respondent, anywhere, ever. The study reports aggregates only: percentages, means, standard deviations, confidence intervals and coded themes. When you read in this series that 64 percent of respondents play primarily on mobile, that figure is a property of the sample as a whole and cannot be traced back to any individual. Anonymization protects respondents, and it also encourages candor. A player who knows their answers cannot be linked to their name is more likely to tell us honestly that a casino was slow to pay or that support never answered, which is exactly the kind of uncomfortable truth a casino rating needs in order to be useful.

What we asked: the survey instrument

A sample is only as good as the questions put to it. We used a mixed-methods instrument, combining a structured quantitative core with an open-ended qualitative core, so that the numbers could be tested for precision and the human detail behind the numbers could be preserved. This section walks through both.

The quantitative core: five-point Likert batteries

The backbone of the quantitative instrument is a set of five-point Likert scales, in the tradition that Likert himself introduced in 1932. Each item presents a statement and asks the respondent to choose a level of agreement from 1 to 5, where 1 means strongly disagree and 5 means strongly agree. We did not ask a single question per pillar. Instead we used multi-item batteries, several statements per pillar, because a single question is fragile and a battery of related questions is far more reliable. Averaging several items that all probe the same underlying idea reduces the noise from any one oddly worded item, and it lets us measure internal consistency with Cronbach's alpha, which we discuss in the reliability section.

The pillars are not arbitrary. Each one is grounded in an established framework, which we cover in depth in the frameworks discussion of this series. For the instrument, what matters is that each pillar has its own battery of Likert items, and that those items are phrased in plain language a California player would actually recognize. The table below shows one representative example item per pillar. In the real instrument each of these sat among several siblings probing the same construct from different angles.

PillarExample Likert item (1 = strongly disagree, 5 = strongly agree)Underlying framework
Trust & Licensing"I trust this casino to hold and return my funds."Mayer trust model (1995)
Payout Speed & Banking"My withdrawals were paid within the time the casino promised."Oliver expectation-confirmation (1980)
Bonuses & Value"The wagering requirements were explained clearly before I opted in."Prospect theory, Kahneman & Tversky (1979)
Customer Support"When I contacted support, my issue was resolved promptly."SERVQUAL, Parasuraman et al. (1988)
Mobile & Responsible Gambling"The app or site was easy to use, and responsible-gambling tools were easy to find."TAM, Davis (1989)

Notice how each example item is concrete and behavioral. We did not ask "Do you trust this casino?" in the abstract. We asked whether the player trusts the casino to hold and return their funds, which is the specific behavior that the trust framework cares about. We did not ask "Was support good?" We asked whether, when the player contacted support, the issue was resolved promptly, which is the responsiveness and assurance dimension that SERVQUAL identifies. Concreteness matters because it reduces the chance that two respondents read the same item to mean two different things, and that reduction in ambiguity is one of the things that pushes our reliability coefficients into the acceptable-to-high range.

The would-recommend question (Net Promoter style)

Alongside the pillar batteries we asked a single would-recommend question, framed in the Net Promoter style that Reichheld set out in 2003. The wording was: "How likely are you to recommend this casino to another California player?" answered on a 0 to 10 scale. We deliberately localized the question to "another California player" rather than to a generic friend, because the CAC Score is about the California experience and a recommendation is only meaningful within that population. From the 0 to 10 answers we compute a net-recommend figure in the standard way, subtracting the share of detractors from the share of promoters. As reported elsewhere in this series, top-tier casinos earned a net-recommend of plus 58, mid-tier casinos plus 21, and bottom-tier casinos minus 9, a spread that tracks the CAC Scores closely and gives us an external check on the pillar batteries.

The qualitative core: open-ended prompts and depth interviews

Numbers tell you how much, but they do not tell you why, and a casino rating that cannot explain itself is not much use to a player deciding where to deposit. So the quantitative core was paired with a qualitative core. Every respondent was offered open-ended prompts, free-text questions where they could write in their own words. Two of the prompts we used were "Describe your most recent withdrawal experience" and "What almost stopped you from signing up as a California player?" The first prompt surfaces the lived texture of payouts, the waiting, the document requests, the relief or frustration. The second prompt is deliberately framed around hesitation, because the things that almost stop a Californian from signing up are exactly the friction points a rating should warn about, from geo-restrictions to confusing bonus terms.

On top of the open-ended prompts we conducted 60 follow-up depth interviews. These were longer, semi-structured conversations with a subset of respondents who agreed to talk further, again entirely voluntarily and with consent. The depth interviews let us probe stories that a text box cannot capture, to ask "and then what happened?" and to understand the sequence of events behind a frustrating payout or a smooth one. All qualitative material, both the open-ended text and the interview transcripts, was analyzed using the six-phase thematic analysis method described by Braun and Clarke in 2006: familiarization, generating initial codes, searching for themes, reviewing themes, defining and naming themes, and reporting. We used more than one coder and checked inter-coder agreement, so that the themes we report are not the idiosyncratic reading of a single analyst.

The qualitative analysis produced a set of recurring themes that we report in full elsewhere in this series, including payout reliability and speed, bonus-terms clarity and wagering, customer-support responsiveness, California acceptance and geo-restrictions, game and provider variety, and trust, licensing and fairness. For the purposes of this methodology paper, the important point is that these themes were derived from coded data, not invented, and that the qualitative findings rest on the thematic-analysis method rather than on the probability mathematics of the survey. The two cores answer different questions and stand on different foundations, and we keep that distinction clean.

How we sampled California: random and snowball methods

With the population defined, the eligibility rules set, and the instrument built, we come to the heart of this paper: how respondents were actually selected. We used two methods in combination. The backbone was stratified random sampling, the probability method that gives us the right to quote a margin of error. Supplementing it was snowball sampling, a non-probability method we used only to reach hard-to-find segments. We then re-weighted the whole sample so that it matched California's regional and demographic structure.

Recruitment by sampling method

78%22%Stratified randomSnowball referral
Stratified random sampling formed the backbone; snowball referral reached harder-to-find segments.

As the figure shows, 78 percent of the final sample was recruited through stratified random sampling and 22 percent through snowball referral. That ratio is deliberate. The probability backbone is by far the larger share, which is what allows the probability mathematics to govern the headline numbers, while the snowball portion is a minority supplement used where random recruitment alone would have left gaps.

Stratified random sampling: the probability backbone

Random sampling is the engine of statistical inference. When each member of a population has a known, non-zero chance of being selected, the laws of probability let us quantify how far the sample is likely to stray from the truth. That is what a margin of error is: a statement, grounded in probability, about how close the sample estimate is likely to be to the population value. Without random selection there is no honest way to compute one, which is why the random backbone is not optional for us.

We did not use simple random sampling across the whole state, because that would have risked over-representing the most populous regions purely by chance and under-representing the smaller ones. Instead we used stratified random sampling, the approach Cochran describes in his classic treatment of sampling techniques. We divided California into six strata corresponding to its major regions, set a target sample size for each stratum in proportion to that region's share of the California population, and then sampled randomly within each stratum. Stratification has two benefits. It guarantees that every region is present in the sample in roughly its true proportion, and it generally produces estimates that are at least as precise as simple random sampling, because it removes between-stratum variation from the error. The table below shows the six strata, each region's share of the California population, the achieved sample size in that region, and the margin of error that the achieved size supports.

RegionCA population shareAchieved nMargin of error
Southern California58%2,446±1.98%
San Francisco Bay Area20%843±3.38%
Central Valley11%464±4.55%
Sacramento Metro6%253±6.16%
Central Coast3%127±8.7%
North State2%84±10.69%
Total100%4,217±1.51%

Read this table carefully, because it shows the trade-off at the heart of stratified design. Southern California, with 58 percent of the state's population, contributes 2,446 respondents and carries a tight regional margin of error of plus or minus 1.98 percent. The North State, with just 2 percent of the population, contributes 84 respondents and carries a much wider regional margin of plus or minus 10.69 percent. That is exactly what we would expect: smaller samples are noisier. The crucial point is that when all six strata are combined into the full sample of 4,217, the overall margin of error collapses to plus or minus 1.51 percent. The statewide number is far more precise than any single region's number, because it pools the information from all six. When we report a result for a small region on its own, we keep its wider margin in mind and we say so. When we report a statewide CAC Score, the tight 1.51 percent figure applies.

Sample stratified by California region

58%20%11%6%Southern CaliforniaSan Francisco Bay AreaCentral ValleySacramento MetroCentral CoastNorth State
The N = 4,217 sample was stratified to mirror California's regional population share.

The pie reinforces the table visually. The dominant Southern California slice and the slim North State slice are not accidents of who happened to answer; they are targets we set in advance to match the state, and then filled with randomly selected respondents within each region. That is the difference between a sample that looks like California and a sample that merely happened.

Snowball sampling: reaching the hard-to-find

Random sampling is powerful, but it struggles with segments that are rare, dispersed, or reluctant to be found through ordinary channels. Among California online-casino players, certain segments are genuinely hard to reach at random: players in the sparsely populated North State, players who use specific crypto-first banking methods, and players who are active but private and do not join the panels or communities that a random frame draws on. To avoid simply leaving these people out, we supplemented the random backbone with snowball sampling, the chain-referral method formally described by Goodman in 1961. In snowball sampling, an initial set of eligible respondents is asked to refer other eligible people they know, who in turn refer others, so the sample grows like a rolling snowball through social networks that a random frame cannot easily touch.

We are deliberately careful about what snowball sampling can and cannot do, because this is where many casino "studies" quietly cheat. Snowball sampling is, by its nature, a non-probability method. Because referrals follow social ties, the selection probabilities are unknown and almost certainly unequal, which means the clean probability mathematics that underlies a margin of error does not apply to the snowball portion on its own. We do not pretend otherwise. The snowball respondents made up 22 percent of the final sample and were used to fill gaps in the harder-to-reach segments, not to carry the headline inference. The strict statistical claims in this series, the margins of error and the 95 percent confidence intervals, rest on the probability backbone. The snowball portion is supportive, re-weighted, and reported transparently as exactly what it is.

Re-weighting to match California

Combining a probability sample with a non-probability supplement raises an obvious question: how do we make sure the mixture still looks like California rather than looking like whoever happened to get referred? The answer is re-weighting. After all data was collected, we re-weighted the combined sample to the regional and demographic strata so that each region and each major demographic group appears in the analysis in its true California proportion. If snowball referral happened to over-recruit a particular age band or a particular region, re-weighting pulls that group's influence back down to its correct share, and pulls under-represented groups up. Re-weighting is the bridge that lets the snowball supplement strengthen coverage without distorting the overall picture. It does not turn the snowball portion into a probability sample, and we never claim it does. It simply ensures that the descriptive composition of the analyzed sample matches the population we are trying to describe.

The demographic shape of the sample

Having mirrored California by region, we also checked and re-weighted the sample against the demographic structure of the California online-casino audience. A sample can be regionally perfect and still be skewed on age, device or experience, so we examined all of these. The tables and figures in this section describe the composition of the analyzed sample of 4,217 respondents.

Respondents by age band

22%38%27%13%21-2930-4445-5960+
Age distribution of the verified California sample (all respondents 21 or older).

Primary device used to play

64%28%8%MobileDesktopTablet
California players skew heavily mobile, which is why the mobile pillar is tested on iOS and Android.
DimensionCategoryShare of sample
Age band21-2922%
Age band30-4438%
Age band45-5927%
Age band60+13%
Primary deviceMobile64%
Primary deviceDesktop28%
Primary deviceTablet8%
GenderMale61%
GenderFemale38%
GenderOther or undisclosed1%
ExperienceUnder 1 year18%
Experience1 to 3 years41%
Experience3 to 5 years27%
Experience5 years or more14%

The age profile is a working-age profile. The single largest band is 30 to 44 at 38 percent, followed by 45 to 59 at 27 percent and 21 to 29 at 22 percent, with the 60-plus band making up the remaining 13 percent. This is what we would expect from an active online-casino audience, and it has direct consequences for how we weight the pillars and interpret results. A sample dominated by younger players would over-state the importance of flashy mobile features, while a sample dominated by older players might under-state them. Matching the real age distribution keeps the CAC Score honest about which features actually matter to the California audience as a whole.

The device profile is the most consequential single demographic fact in the whole study. Nearly two thirds of respondents, 64 percent, play primarily on mobile, against 28 percent on desktop and just 8 percent on tablet. California players are mobile players. That is precisely why we test the Mobile and Responsible Gambling pillar on real iOS and Android devices rather than judging it from a desktop browser. If we sampled a desktop-heavy population we would design our hands-on testing differently, and we would mislead our readers. The sample shape drives the test design, which is one more reason the sampling chapter has to come before the results chapter.

On gender, the sample is 61 percent male, 38 percent female, and 1 percent other or undisclosed. That 1 percent undisclosed slice is a small but real reminder of the voluntary, skip-allowed nature of the instrument: respondents who preferred not to state a gender simply did not, and we report them as their own category rather than forcing them into one of the others. On experience, the sample is weighted toward established players, with 41 percent reporting 1 to 3 years of play and a further 27 percent reporting 3 to 5 years, while 18 percent are relative newcomers under a year and 14 percent are veterans of 5 years or more. Experienced players give us reliable accounts of payouts and support over time, while the 18 percent of newer players keep the sample anchored to the sign-up experience that a prospective California player is about to face.

Sample-size mathematics and the margin of error

This is the section that justifies the headline claim of plus or minus 1.51 percent. We will build it up step by step, because the whole credibility of a quantitative rating rests on this arithmetic being correct and being shown rather than asserted. Everything here applies to the probability backbone of the design, the stratified random sample, which is the portion entitled to inferential claims.

The sample-size formula

The starting point is the classic formula for the sample size needed to estimate a population proportion at a given confidence level and margin of error.

n = z² · p(1 − p) / e²

Here n is the required sample size, z is the standard-normal critical value for the chosen confidence level, p is the assumed population proportion, and e is the desired margin of error expressed as a proportion. For 95 percent confidence the critical value z is 1.96, the familiar number that captures the central 95 percent of a normal distribution. The term p(1 minus p) is the variance of a proportion, and it reaches its maximum when p equals 0.5. We deliberately assume p equals 0.5 throughout, because that is the most conservative possible choice: it produces the largest required sample size and therefore the most cautious margin of error. By assuming the worst case for variance, we guarantee that our real margin of error is no worse than what we report, and is in fact slightly better for any proportion away from 0.5.

The finite population correction

The basic formula assumes an effectively infinite population. When the sample is a non-trivial fraction of a finite population, we can do better, because each respondent we draw removes a little remaining uncertainty about the population. The finite population correction adjusts the required sample size downward to reflect this.

n₀ = n / (1 + (n − 1)/N)

In this expression n is the uncorrected sample size from the first formula, N is the size of the finite population, and n-zero is the corrected sample size. The correction matters most when the sampling fraction is large. In our case the sampling fraction is only about 1 percent of the active California online-casino audience, so the correction is modest: with a sampling fraction that small, the term (n minus 1) divided by N is tiny, and the corrected sample size is only slightly below the uncorrected one. We apply the correction for completeness and honesty, but we note that at a 1 percent sampling fraction it barely moves the numbers. This is one of the quiet advantages of fielding a large but still small-fraction sample: we get the precision of a big sample without needing to lean heavily on the finite-population adjustment.

The margin-of-error working

To report a margin of error, we rearrange the relationship to solve for e given the achieved sample size n. The margin of error for a proportion at a chosen confidence level is:

e = z · √( p(1 − p) / n )

Now we substitute the actual numbers. We use the most conservative proportion, p equals 0.5, so that p(1 minus p) equals 0.25. We use the 95 percent critical value, z equals 1.96. And we use the full achieved probability sample size, n equals 4,217. The arithmetic runs as follows. First, 0.25 divided by 4,217 equals approximately 0.0000593. The square root of 0.0000593 is approximately 0.0077. Multiplying by 1.96 gives approximately 0.0151. Written out as the spec sets it down:

e = 1.96 · √( 0.25 / 4217 ) = 0.0151

That is a margin of error of plus or minus 1.51 percent at the 95 percent confidence level, with alpha equal to 0.05. In plain language, if we repeated this study many times under the same design, we would expect the survey estimate to fall within 1.51 percentage points of the true population value in about 95 of every 100 repetitions. This is the single most important number in our methodology, and we have now shown exactly where it comes from: the conservative proportion of 0.5, the 95 percent critical value of 1.96, and the achieved sample of 4,217.

The same machinery generates the regional margins of error in the stratification table. Each region's margin is computed by plugging that region's achieved n into the margin-of-error formula with the same p equals 0.5 and z equals 1.96. That is why Southern California with 2,446 respondents lands at plus or minus 1.98 percent while North State with only 84 respondents lands at plus or minus 10.69 percent: smaller n, larger e, exactly as the formula demands. The numbers in that table are not decorative. They are the formula applied region by region, and you can verify any of them with the equation above.

Confidence intervals around the pillar means

The proportion mathematics governs percentages, but several of our key results are means on the 1 to 5 Likert scale rather than proportions. For means we use the standard confidence-interval formula:

x̄ ± 1.96 · (s / √n)

Here x-bar is the sample mean, s is the sample standard deviation, n is the sample size, and 1.96 is again the 95 percent critical value. The quantity s divided by the square root of n is the standard error of the mean, and multiplying it by 1.96 gives the half-width of the 95 percent confidence interval. The reason our pillar means carry such tight intervals, on the order of plus or minus 0.02 on the 1 to 5 scale, is precisely the large n: dividing by the square root of 4,217 shrinks the standard error dramatically. This is the same large-sample dividend that produced the tight 1.51 percent proportion margin, expressed in the language of means. The sample variance that feeds these intervals is computed in the usual way:

s² = Σ(xᵢ − x̄)² / (n − 1)

where the sum runs over all respondents, x-sub-i is an individual response, x-bar is the mean, and we divide by n minus 1 rather than n to obtain an unbiased estimate of the population variance. Reporting the variance and standard deviation alongside the mean is not optional in our view; a mean without a measure of spread hides how much disagreement sits behind it, and disagreement is itself a finding.

Validity, reliability and limitations

A defensible methodology does not stop at computing a margin of error. It has to argue that the instrument measures what it claims to measure (validity), that it measures consistently (reliability), and that its weaknesses are stated openly (limitations). Crucially, each of these arguments has to be tied back to how the sampling was actually done, rather than asserted in the abstract.

Why the probability backbone justifies the margin of error

The single most important validity claim in this paper is that our margin of error is meaningful, and that claim rests entirely on the probability backbone. Because 78 percent of the sample was recruited through stratified random sampling, in which eligible Californians within each region had a known and non-zero chance of selection, the probability mathematics of the previous section genuinely applies. A margin of error is a probability statement, and probability statements require probability sampling. We are entitled to say plus or minus 1.51 percent at 95 percent confidence precisely because the backbone is random and because the achieved sample is large. This is the link between the sampling design and the headline number: the design earns the number. If we had simply gathered whoever volunteered, the 1.51 percent figure would be meaningless decoration, and we would not print it.

Why snowball sampling only supplements

The flip side of that argument is our honesty about the snowball portion. Snowball sampling, following Goodman in 1961, is a non-probability method. The 22 percent of the sample recruited through referral does not carry known selection probabilities, so strict statistical inference does not extend to it on its own. We handle this in three explicit ways. We keep the snowball portion a minority of the sample so it cannot dominate the design. We re-weight the combined sample to the regional and demographic strata so that the snowball respondents do not distort the composition. And we report the snowball portion transparently as exactly what it is, a supplement that improves coverage of hard-to-reach segments rather than a basis for inference. The reader should come away understanding that the confidence claims belong to the probability portion and that the snowball portion is supportive, re-weighted, and disclosed. That separation is itself a validity safeguard: we are not laundering a convenience sample into a probability claim.

Reliability: internal consistency via Cronbach's alpha

Reliability asks whether the instrument measures consistently. Because each pillar uses a multi-item Likert battery rather than a single question, we can quantify internal consistency directly with Cronbach's alpha, the coefficient introduced by Cronbach in 1951. Alpha measures the degree to which the items in a battery move together, on the reasonable assumption that items measuring the same underlying construct should correlate. The coefficient is computed as:

α = (k / (k − 1)) · (1 − Σsᵢ² / sₜ²)

where k is the number of items in the battery, the numerator sum is the total of the individual item variances, and s-sub-t-squared is the variance of the summed scale. Across our pillar batteries, alpha ranged from 0.84 to 0.91. By the conventional rules of thumb, values above 0.7 are acceptable, above 0.8 are good, and above 0.9 are excellent, so our batteries sit in the acceptable-to-high band throughout. Game Selection, at alpha equals 0.91, was the most internally consistent pillar, while Bonuses and Value, at 0.84, was the least, which makes intuitive sense given how varied bonus structures are across casinos. These coefficients are reported pillar by pillar elsewhere in this series. The point for this methodology paper is that reliability is not a hope, it is a measured quantity, and it ties directly back to the decision to use multi-item batteries during instrument design.

Construct validity: pillars grounded in established frameworks

Construct validity asks whether each pillar measures the thing it names. We protect construct validity by grounding each pillar in an established theoretical framework rather than inventing it ad hoc. The Trust and Licensing pillar is built on the integrative trust model of Mayer, Davis and Schoorman from 1995. The would-recommend and overall satisfaction work draws on Oliver's expectation-confirmation theory from 1980. The Customer Support pillar is built on the SERVQUAL service-quality framework of Parasuraman, Zeithaml and Berry from 1988. The Mobile and Responsible Gambling pillar draws on the Technology Acceptance Model of Davis from 1989. The Bonuses and Value pillar is informed by prospect theory and the heuristics-and-biases program of Kahneman and Tversky, from 1979 and 1974 respectively, which explain how players actually weigh bonus framing and wagering risk. The net-recommend metric follows Reichheld from 2003. Because each construct comes from a literature with its own validation history, our items inherit a measure of validity from the frameworks they operationalize, rather than relying solely on our own judgment about what to ask.

Qualitative validity

The qualitative findings rest on a different foundation, and we are careful not to borrow the probability mathematics to prop them up. The open-ended responses and the 60 depth interviews were analyzed using the six-phase thematic analysis method of Braun and Clarke from 2006, with more than one coder and a check on inter-coder agreement. The validity of the themes comes from the rigor and transparency of that coding process, from familiarization through coding, theme generation, review, definition and reporting, not from any margin of error. We keep this distinction explicit because conflating qualitative trustworthiness with quantitative precision is a common and misleading move, and we refuse to make it.

Limitations stated honestly

No survey is perfect, and a methodology that claims otherwise is not to be trusted. We state our limitations plainly. First, self-selection bias is possible: people who agree to take a casino survey may differ from those who decline, and even a probability backbone cannot fully eliminate the tendency for the willing to differ from the unwilling. Second, recall bias is possible: a player describing a withdrawal from weeks ago may remember it as faster or slower than it was. We mitigate recall bias on the most consequential claims by cross-checking self-reported payout times against our own hands-on testing, in which we deposit, play and withdraw real money and time the results ourselves, so the survey's memory of payouts is anchored to measured reality rather than left to stand alone. Third, the coverage gap between the true population and the reachable frame remains, which is why we recruit through several channels and re-weight. Fourth, and most fundamentally, our findings describe California players in spring 2026. Casinos change their payout speeds, their bonus terms and their support staffing over time, and player expectations move too, so a CAC Score is a snapshot of a moment and not a permanent verdict. We expect to re-field and update, and we will say so when we do.

Analysis software and fieldwork timing

For completeness we record the practical machinery behind the numbers. All quantitative analysis, including the descriptive statistics, the confidence intervals, the independent-samples t-tests, the one-way ANOVA across regions, and the Cronbach's alpha reliability coefficients, was conducted in IBM SPSS Statistics. Using a single, well-documented statistical package rather than ad hoc spreadsheets reduces the chance of arithmetic error and makes our pipeline auditable. The fieldwork itself ran across the weeks preceding publication, in spring 2026, which is the window the limitations section refers to when it says the findings are a snapshot of that period. We mention the inferential tests here only to note where they were run; their full results, including the crypto-versus-fiat payout t-test at t(4215) equals 18.7 with p below .001 and the regional payout ANOVA at F(5, 4211) equals 2.94 with p equals .012, are interpreted in the results part of this series rather than in this methodology paper.

How this paper connects to the rest of the series

This methodology paper is one quarter of a larger argument. Part one sets out the story of CA Casinos, Palo Alto Organization, the origin of the CAC Score in a Palo Alto player panel, and the meaning of the eight weighted pillars. This part, part two, has shown how we sampled California: the target population and frame, the eligibility and consent rules, the survey instrument, the stratified random backbone and the snowball supplement, the re-weighting to California's strata, the sample-size mathematics that yields plus or minus 1.51 percent, and the validity, reliability and limitations that make those numbers defensible. The remaining two parts build directly on this foundation. The results paper interprets the descriptive statistics, the confidence intervals and the inferential tests across the fifteen reviewed casinos, and the final paper translates everything into the weighted CAC Score and the California-lens verdicts you see on each review. We have written this part so that, by the time you read the results, you already know exactly how much weight each number can bear and why.

If you want to see the rating itself and how the eight pillars combine into a single score out of 100, visit /cac-score/, where the methodology summarized here is put to work on every casino we review for California players.

  1. Mayer, R. C., Davis, J. H., & Schoorman, F. D. (1995). An Integrative Model of Organizational Trust. Academy of Management Review, 20(3), 709-734.
  2. Goodman, L. A. (1961). Snowball Sampling. Annals of Mathematical Statistics, 32(1), 148-170.
  3. Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Wiley.
  4. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method. Wiley.
  5. Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology, 140, 1-55.
  6. Oliver, R. L. (1980). A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions. Journal of Marketing Research, 17(4), 460-469.
  7. Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (1988). SERVQUAL: A Multiple-Item Scale for Measuring Consumer Perceptions of Service Quality. Journal of Retailing, 64(1), 12-40.
  8. Davis, F. D. (1989). Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly, 13(3), 319-340.
  9. Kahneman, D., & Tversky, A. (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica, 47(2), 263-291.
  10. Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests. Psychometrika, 16(3), 297-334.
  11. Reichheld, F. F. (2003). The One Number You Need to Grow. Harvard Business Review, 81(12), 46-54.
  12. Braun, V., & Clarke, V. (2006). Using Thematic Analysis in Psychology. Qualitative Research in Psychology, 3(2), 77-101.
  13. Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124-1131.
Play responsibly. Gambling is intended for adults aged 21 and older. If you or someone you know has a gambling problem, call 1-800-522-4700 or visit the California Council on Problem Gambling.