Frequently asked questions

Which version of the data am I using?

Access the dataset label to determine the version of the extracts you are using:

use epi_cpsorg_2019.dta, clear

* describe the data to view the version
describe, short

* save the version information in a local macro
local dataversion: data label
display "`dataversion'"
mydata <- haven::read_dta("epi_cpsorg_2019.dta")
dataversion <- attr(mydata,"label")

What sample restrictions are used in the EPI extracts?

The EPI CPS Basic monthly and May extracts are restricted to those with non-missing, positive ages. The EPI CPS ORG extracts are restricted to those ages 16 and above with a positive earner sample weight (orgwgt) and in the outgoing rotation months (minsamp).

As a result the sample of individuals in the EPI extracts is sometimes smaller than what is in the raw, underlying CPS data, which can include nonresponding households and, in the case of the underlying ORG data, individuals below the age of 16.

Basic/May sample restriction
********************************************************************************
* BASIC/MAY SAMPLE RESTRICTION
********************************************************************************
* only include those with non-missing, positive age
if $marchcps == 1 {
    if tm(1962m1) <= $date & $date <= tm(1997m12) {
        drop if age < 0
        assert age >= 0 & age ~= .
    }
    if tm(1998m1) <= $date {
        drop if a_age < 0
        assert a_age >= 0 & a_age ~= .
    }
}

if $monthlycps == 1 | $maycps == 1 {
    if tm(1973m1) <= $date & $date <= tm(1993m12) {
        drop if age == .
        assert age >= 0
    }
    if tm(1994m1) <= $date & $date <= tm(2012m4) {
        drop if peage < 0
        assert peage >= 0 & peage ~= .
    }
    if tm(2012m5) <= $date {
        drop if prtage < 0
        assert prtage >= 0 & prtage ~= .
    }
}
ORG sample restriction
********************************************************************************
* ORG SAMPLE RESTRICTION
********************************************************************************
* restrict to outgoing months
keep if minsamp == 4 | minsamp == 8

* restrict to positive earnings weight
keep if orgwgt > 0 & orgwgt ~= .

* restrict ORG sample to 16 and above
keep if age >= 16 & age ~= .

Which sample weight variable should I use?

There are four sample weights available in the CPS extracts:

Characteristic basicwgt cmpwgt finalwgt orgwgt
Years available All years 1998 - present All years 1979 - present
Samples available All Basic/ORG All ORG
Sample restrictions Ages 16+ Ages 16+ None Earner study

While there is not always a single correct answer regarding which weight you should use, here are some helpful guidelines:

  • For many outcomes in the CPS Basic or May data, use basicwgt to analyze the population ages 16 and over.

  • If your analysis involves the ORG data, such as earnings information, use orgwgt.

  • finalwgt is the only weight defined for individuals under the age of 16.

  • If you want to match monthly labor force statistics published by the BLS like unemployment rates, this is possible from 1998 through the present day using basicwgtor cmpwgt, which have the same values during that time period.

In a given month, the above weights sum to the total monthly population estimate of the relevant sample.

Which wage variable should I use?

There are several hourly wage variables in the EPI CPS extracts, but for many purposes we recommend using wage or wageotc.

wageotc includes overtime, tips, and commissions (OTC) for hourly workers, but is only available for 1994-present. wage is available for all years, but does not include OTC payments for hourly workers.

Both wage and wageotc include several adjustments by EPI to improve the quality of the data: top-code imputations, hours imputations, and the trimming of outliers (see the wage methodology for more details). However, for convenience there are variables that exclude these adjustments, as described in the table below:

  wage wageotc wage_noadj wageotc_noadj
Availability All years 1994 - present All years 1994 - present
OTC for hourly workers No Yes No Yes
Top-code imputations Yes Yes No No
Hours vary imputations Yes Yes No No
Trimming Yes Yes No No
BLS imputations Yes Yes Yes Yes

Additionally, if you want to use a wage variable without any weekly or hourly earnings imputations by EPI or BLS, you can incorporate the allocation flags a_weekpay and a_earnhour.

Example Stata code to exclude EPI and BLS imputations
* Be aware that the allocation indicators are not consistent over time.
* In particular, there is no allocation information at all during Jan 1994 - August 1995.
gen wage_noimpute = wage_noadj
replace wage_noimpute = . if paidhre == 1 & a_earnhour == 1
replace wage_noimpute = . if paidhre == 0 & a_weekpay == 1

How do I merge the EPI CPS extracts to other sources of CPS extracts?

For years 1984-present, the following variables uniquely identify observations in the EPI CPS extracts:

year month statefips hrhhid hrhhid2 hrsersuf hrsample huhhnum pulineno

These variables are present in the raw data available from Census, NBER, or IPUMS.

For years prior to 1994, EPI CPS extracts are based on Unicon source data. For these years the EPI variable unicon_recnum uniquely identifies observations and will match to Unicon's recnum variable.