uscrn.get_data#
- uscrn.get_data(years=None, which='daily', *, station_id=None, cat=False, dropna=False, apply_qc=True, n_jobs=None)#
Get USCRN archive data.
Home page: https://www.ncei.noaa.gov/access/crn/
Sites are stored in separate files for these datasets. If you want to quickly get data for all sites for a short, recent period of time, consider using
get_nrt_data().Note
Variable and dataset metadata are included in the
.attrsdict. These can be preserved if you have pandas v2.1+ and save the dataframe to Parquet format with the PyArrow engine.df.to_parquet('crn.parquet', engine='pyarrow')
- Parameters:
years (int | Iterable[int] | None) – Year(s) to get data for. If
None(default), get all available years. If which is'monthly', years is ignored and you always get all available years.which (Literal['subhourly', 'hourly', 'daily', 'monthly']) – Which dataset.
station_id (str | Iterable[str] | None) – Site or sites (specified using USCRN station ID) to get data for. Default is to get all sites.
cat (bool) – Convert some columns to pandas categorical type.
dropna (bool) – Drop rows where all data cols are missing data.
apply_qc (bool) – Apply the QC flags, masking non-“good” data with NaN. This only impacts subhourly and hourly data, and only certain variables.
n_jobs (int | None) – Number of parallel joblib jobs to use for loading the individual files. The default is to use
min(joblib.cpu_count() - 1, num_files).
- Return type:
See also
- Daily data
Notebook example demonstrating using this function to get a year of daily data.