uscrn.get_data#

uscrn.get_data(years=None, which='daily', *, station_id=None, cat=False, dropna=False, apply_qc=True, n_jobs=None)#

Get USCRN archive data.

Home page: https://www.ncei.noaa.gov/access/crn/
Info: https://www.ncei.noaa.gov/access/crn/qcdatasets.html
Data: https://www.ncei.noaa.gov/pub/data/uscrn/products/

Sites are stored in separate files for these datasets. If you want to quickly get data for all sites for a short, recent period of time, consider using get_nrt_data().

Note

Variable and dataset metadata are included in the .attrs dict. These can be preserved if you have pandas v2.1+ and save the dataframe to Parquet format with the PyArrow engine.

df.to_parquet('crn.parquet', engine='pyarrow')

Parameters:

years (int | Iterable[int] | None) – Year(s) to get data for. If None (default), get all available years. If which is 'monthly', years is ignored and you always get all available years.
which (Literal['subhourly', 'hourly', 'daily', 'monthly']) – Which dataset.
station_id (str | Iterable[str] | None) – Site or sites (specified using USCRN station ID) to get data for. Default is to get all sites.
cat (bool) – Convert some columns to pandas categorical type.
dropna (bool) – Drop rows where all data cols are missing data.
apply_qc (bool) – Apply the QC flags, masking non-“good” data with NaN. This only impacts subhourly and hourly data, and only certain variables.
n_jobs (int | None) – Number of parallel joblib jobs to use for loading the individual files. The default is to use min(joblib.cpu_count() - 1, num_files).

Return type:

DataFrame

uscrn.get_data

Contents

uscrn.get_data#