Daily data

Contents

Daily data#

In this example, we load a year of daily data and make some static plots with Matplotlib.

import matplotlib.pyplot as plt

import uscrn

Get data#

With uscrn.get_data(), we get back a pandas.DataFrame.

%%time

df = uscrn.get_data(2019, "daily")

df

	wban	lst_date	crx_vn	longitude	latitude	t_daily_max	t_daily_min	t_daily_mean	t_daily_avg	p_daily_calc	...	soil_moisture_5_daily	soil_moisture_10_daily	soil_moisture_20_daily	soil_moisture_50_daily	soil_moisture_100_daily	soil_temp_5_daily	soil_temp_10_daily	soil_temp_20_daily	soil_temp_50_daily	soil_temp_100_daily
0	23583	2019-10-09	2.514	-158.610001	59.279999	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	23583	2019-10-10	2.514	-158.610001	59.279999	NaN	NaN	NaN	NaN	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	23583	2019-10-11	2.514	-158.610001	59.279999	7.7	-2.4	2.7	2.5	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	23583	2019-10-12	2.514	-158.610001	59.279999	6.2	-3.5	1.3	0.8	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	23583	2019-10-13	2.514	-158.610001	59.279999	6.3	-2.8	1.7	1.2	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
56289	94088	2019-12-27	2.622	-104.440002	44.520000	-0.4	-10.1	-5.2	-4.9	0.0	...	NaN	NaN	0.313	0.348	0.457	-0.5	-0.1	0.3	1.1	2.2
56290	94088	2019-12-28	2.622	-104.440002	44.520000	-5.0	-11.0	-8.0	-8.5	0.0	...	NaN	NaN	NaN	0.347	0.455	-0.7	-0.3	0.2	1.1	2.2
56291	94088	2019-12-29	2.622	-104.440002	44.520000	-9.0	-11.2	-10.1	-10.1	2.8	...	NaN	NaN	NaN	0.346	0.456	-0.8	-0.4	0.1	1.0	2.1
56292	94088	2019-12-30	2.622	-104.440002	44.520000	-5.8	-13.0	-9.4	-9.3	0.0	...	NaN	NaN	NaN	0.346	0.456	-1.1	-0.5	0.1	1.0	2.1
56293	94088	2019-12-31	2.622	-104.440002	44.520000	1.6	-7.6	-3.0	-2.2	0.0	...	NaN	NaN	NaN	0.347	0.454	-1.2	-0.6	0.0	0.9	2.0

56294 rows × 28 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56294 entries, 0 to 56293
Data columns (total 28 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 wban                     56294 non-null  object        
 lst_date                 56294 non-null  datetime64[ns]
 crx_vn                   56024 non-null  object        
 longitude                56294 non-null  float32       
 latitude                 56294 non-null  float32       
 t_daily_max              55580 non-null  float32       
 t_daily_min              55579 non-null  float32       
 t_daily_mean             55577 non-null  float32       
 t_daily_avg              55552 non-null  float32       
 p_daily_calc             54890 non-null  float32       
solarad_daily            50009 non-null  float32       
sur_temp_daily_type      56294 non-null  object        
sur_temp_daily_max       49745 non-null  float32       
sur_temp_daily_min       49746 non-null  float32       
sur_temp_daily_avg       49746 non-null  float32       
rh_daily_max             49019 non-null  float32       
rh_daily_min             49019 non-null  float32       
rh_daily_avg             49019 non-null  float32       
soil_moisture_5_daily    33154 non-null  float32       
soil_moisture_10_daily   35001 non-null  float32       
soil_moisture_20_daily   28039 non-null  float32       
soil_moisture_50_daily   27850 non-null  float32       
soil_moisture_100_daily  27044 non-null  float32       
soil_temp_5_daily        39246 non-null  float32       
soil_temp_10_daily       39583 non-null  float32       
soil_temp_20_daily       31306 non-null  float32       
soil_temp_50_daily       30825 non-null  float32       
soil_temp_100_daily      30656 non-null  float32       
dtypes: datetime64[ns](1), float32(24), object(3)
memory usage: 6.9+ MB

Dataset and variable attributes are stored under .attrs.

list(df.attrs)

['which', 'title', 'created', 'source', 'attrs', 'notes']

some_keys = ["which", "title", "created", "source"]
for key in some_keys:
    print(f"{key:{max(len(key) for key in some_keys) + 1}}", df.attrs[key])

which    daily
title    U.S. Climate Reference Network (USCRN) | daily | 2019
created  2024-01-27 15:50:51.784080+00:00
source   https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01

Plot temperature#

vn = "t_daily_avg"

df.attrs["attrs"][vn]

{'name': 't_daily_avg',
 'long_name': 'daily average air temperature',
 'units': 'degree_Celsius',
 'description': 'Average air temperature, in degrees C. See Note F.',
 'dtype': 'float32',
 'xarray_only': False,
 'categories': False}

df_ = df[["wban", "lst_date", "latitude", "longitude", vn]]

fig, ax = plt.subplots(figsize=(10, 5), tight_layout=True)

# Individual site time series
df_.groupby("wban").plot(x="lst_date", y="t_daily_avg", ax=ax, c="0.7", lw=0.5, alpha=0.35, legend=False)

# Mean and median
agg = df_.groupby(df_.lst_date)[[vn]].agg(("mean", "median"))
agg[vn]["median"].plot(x="lst_date", y="t_daily_avg", ax=ax, c="0.3", lw=1.8, ls="--", legend=False)
agg[vn]["mean"].plot(x="lst_date", y="t_daily_avg", ax=ax, c="0.3", lw=2, legend=False)

ax.set_title(df.attrs["title"], loc="left", size=10)

ax.set(
    xlabel="",
    ylabel=f"{df.attrs['attrs'][vn]['long_name']} [{df.attrs['attrs'][vn]['units']}]",
);

../_images/c1da4a42b54b1d82b551805025aaf441a1c810a176e88c64439a61d5ce44f829.png