QC flags

QC flags#

By default, the QC flags are applied. This means that for numeric data columns that have a QC flag column, values where the QC flag is not “0” are set to NaN.

See Select sites for more information about selecting sites and Daily data / uscrn.get_data() and NRT data / uscrn.get_nrt_data() for more information about loading data.

import pandas as pd

import uscrn
station_id = "1045"  # Boulder, CO

df = uscrn.get_data(2019, "hourly", station_id=station_id, n_jobs=1)
df_no_qc = uscrn.get_data(2019, "hourly", station_id=station_id, apply_qc=False)
Hide code cell output
Discovering files...
1 file(s) found
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/2019/CRNH0203-2019-CO_Boulder_14_W.txt
Reading files...
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.5s
Discovering files...
1 file(s) found
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/2019/CRNH0203-2019-CO_Boulder_14_W.txt
Reading files...
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.5s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.5s
qc_vns = [k for k, v in df.attrs["attrs"].items() if v["qc_flag_name"]]

counts = []
for vn in qc_vns:
    fn = df.attrs["attrs"][vn]["qc_flag_name"]
    counts.append(df[fn].value_counts().convert_dtypes().rename(vn))

counts = pd.DataFrame(counts)
counts
0 3
solarad 8756.0 4.0
solarad_max 8750.0 10.0
solarad_min 8756.0 4.0
sur_temp 8756.0 4.0
sur_temp_max 8756.0 4.0
sur_temp_min 8756.0 4.0
rh_hr_avg 8760.0 NaN
vn = counts.sort_values(by="0").iloc[0].name

pd.concat(
    [
        df[vn].isnull().value_counts().rename("qc"),
        df_no_qc[vn].isnull().value_counts().rename("no qc"),
    ],
    axis=1,
)
qc no qc
solarad_max
False 8749 8759
True 11 1
df.sur_temp_type.value_counts()
sur_temp_type
C    8759
U       1
Name: count, dtype: int64

IR surface measurement type#

NRT data are (presumably) more likely to have non-corrected values present.

df = uscrn.get_nrt_data((-4, None), "hourly", n_jobs=2)
Hide code cell output
Discovering files...
  Looking for files in these years
  - 2025
Found 4 file(s) to load
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/updates/2025/CRN60H0203-202504161000.txt
...
https://www.ncei.noaa.gov/pub/data/uscrn/products/hourly02/updates/2025/CRN60H0203-202504161300.txt
Reading files...
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.9s
[Parallel(n_jobs=2)]: Done   2 out of   4 | elapsed:    1.0s remaining:    1.0s
[Parallel(n_jobs=2)]: Done   4 out of   4 | elapsed:    1.2s finished
df.sur_temp_type.value_counts()
sur_temp_type
C    564
U     60
Name: count, dtype: int64
wbans = sorted(df.query("sur_temp_type == 'U'").wban.unique())
print(wbans)
print(len(wbans))
['23801', '23802', '63862', '63867', '63868', '63891', '63892', '63893', '63894', '63895', '63897', '63899', '73801', '73802', '73803']
15