Daily data#

In this example, we load a year of daily data and make some static plots with Matplotlib.

import matplotlib.pyplot as plt

import uscrn

Get data#

With uscrn.get_data(), we get back a pandas.DataFrame.

%%time

df = uscrn.get_data(2019, "daily")
Hide code cell output
Discovering files...
155 files found
https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2019/CRND0103-2019-AK_Aleknagik_1_NNE.txt
...
https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2019/CRND0103-2019-WY_Sundance_8_NNW.txt
Reading files...
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Done   3 tasks      | elapsed:    5.9s
[Parallel(n_jobs=-2)]: Done  10 tasks      | elapsed:    6.0s
[Parallel(n_jobs=-2)]: Done  19 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-2)]: Done  28 tasks      | elapsed:    7.1s
[Parallel(n_jobs=-2)]: Done  39 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-2)]: Done  50 tasks      | elapsed:    8.2s
[Parallel(n_jobs=-2)]: Done  63 tasks      | elapsed:    8.9s
[Parallel(n_jobs=-2)]: Done  76 tasks      | elapsed:    9.7s
[Parallel(n_jobs=-2)]: Done  91 tasks      | elapsed:   10.5s
[Parallel(n_jobs=-2)]: Done 106 tasks      | elapsed:   11.2s
[Parallel(n_jobs=-2)]: Done 123 tasks      | elapsed:   12.2s
[Parallel(n_jobs=-2)]: Done 150 out of 155 | elapsed:   13.5s remaining:    0.4s
CPU times: total: 2.08 s
Wall time: 17 s
[Parallel(n_jobs=-2)]: Done 155 out of 155 | elapsed:   13.8s finished
df
wban lst_date crx_vn longitude latitude t_daily_max t_daily_min t_daily_mean t_daily_avg p_daily_calc ... soil_moisture_5_daily soil_moisture_10_daily soil_moisture_20_daily soil_moisture_50_daily soil_moisture_100_daily soil_temp_5_daily soil_temp_10_daily soil_temp_20_daily soil_temp_50_daily soil_temp_100_daily
0 23583 2019-10-09 2.514 -158.610001 59.279999 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 23583 2019-10-10 2.514 -158.610001 59.279999 NaN NaN NaN NaN 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 23583 2019-10-11 2.514 -158.610001 59.279999 7.7 -2.4 2.7 2.5 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 23583 2019-10-12 2.514 -158.610001 59.279999 6.2 -3.5 1.3 0.8 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 23583 2019-10-13 2.514 -158.610001 59.279999 6.3 -2.8 1.7 1.2 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
56289 94088 2019-12-27 2.622 -104.440002 44.520000 -0.4 -10.1 -5.2 -4.9 0.0 ... NaN NaN 0.313 0.348 0.457 -0.5 -0.1 0.3 1.1 2.2
56290 94088 2019-12-28 2.622 -104.440002 44.520000 -5.0 -11.0 -8.0 -8.5 0.0 ... NaN NaN NaN 0.347 0.455 -0.7 -0.3 0.2 1.1 2.2
56291 94088 2019-12-29 2.622 -104.440002 44.520000 -9.0 -11.2 -10.1 -10.1 2.8 ... NaN NaN NaN 0.346 0.456 -0.8 -0.4 0.1 1.0 2.1
56292 94088 2019-12-30 2.622 -104.440002 44.520000 -5.8 -13.0 -9.4 -9.3 0.0 ... NaN NaN NaN 0.346 0.456 -1.1 -0.5 0.1 1.0 2.1
56293 94088 2019-12-31 2.622 -104.440002 44.520000 1.6 -7.6 -3.0 -2.2 0.0 ... NaN NaN NaN 0.347 0.454 -1.2 -0.6 0.0 0.9 2.0

56294 rows × 28 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56294 entries, 0 to 56293
Data columns (total 28 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   wban                     56294 non-null  object        
 1   lst_date                 56294 non-null  datetime64[ns]
 2   crx_vn                   56024 non-null  object        
 3   longitude                56294 non-null  float32       
 4   latitude                 56294 non-null  float32       
 5   t_daily_max              55580 non-null  float32       
 6   t_daily_min              55579 non-null  float32       
 7   t_daily_mean             55577 non-null  float32       
 8   t_daily_avg              55552 non-null  float32       
 9   p_daily_calc             54890 non-null  float32       
 10  solarad_daily            50009 non-null  float32       
 11  sur_temp_daily_type      56294 non-null  object        
 12  sur_temp_daily_max       49745 non-null  float32       
 13  sur_temp_daily_min       49746 non-null  float32       
 14  sur_temp_daily_avg       49746 non-null  float32       
 15  rh_daily_max             49019 non-null  float32       
 16  rh_daily_min             49019 non-null  float32       
 17  rh_daily_avg             49019 non-null  float32       
 18  soil_moisture_5_daily    33154 non-null  float32       
 19  soil_moisture_10_daily   35001 non-null  float32       
 20  soil_moisture_20_daily   28039 non-null  float32       
 21  soil_moisture_50_daily   27850 non-null  float32       
 22  soil_moisture_100_daily  27044 non-null  float32       
 23  soil_temp_5_daily        39246 non-null  float32       
 24  soil_temp_10_daily       39583 non-null  float32       
 25  soil_temp_20_daily       31306 non-null  float32       
 26  soil_temp_50_daily       30825 non-null  float32       
 27  soil_temp_100_daily      30656 non-null  float32       
dtypes: datetime64[ns](1), float32(24), object(3)
memory usage: 6.9+ MB

Dataset and variable attributes are stored under .attrs.

list(df.attrs)
['which', 'title', 'created', 'source', 'attrs', 'notes']
some_keys = ["which", "title", "created", "source"]
for key in some_keys:
    print(f"{key:{max(len(key) for key in some_keys) + 1}}", df.attrs[key])
which    daily
title    U.S. Climate Reference Network (USCRN) | daily | 2019
created  2024-01-27 15:50:51.784080+00:00
source   https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01

Plot temperature#

vn = "t_daily_avg"

df.attrs["attrs"][vn]
{'name': 't_daily_avg',
 'long_name': 'daily average air temperature',
 'units': 'degree_Celsius',
 'description': 'Average air temperature, in degrees C. See Note F.',
 'dtype': 'float32',
 'xarray_only': False,
 'categories': False}
df_ = df[["wban", "lst_date", "latitude", "longitude", vn]]

fig, ax = plt.subplots(figsize=(10, 5), tight_layout=True)

# Individual site time series
df_.groupby("wban").plot(x="lst_date", y="t_daily_avg", ax=ax, c="0.7", lw=0.5, alpha=0.35, legend=False)

# Mean and median
agg = df_.groupby(df_.lst_date)[[vn]].agg(("mean", "median"))
agg[vn]["median"].plot(x="lst_date", y="t_daily_avg", ax=ax, c="0.3", lw=1.8, ls="--", legend=False)
agg[vn]["mean"].plot(x="lst_date", y="t_daily_avg", ax=ax, c="0.3", lw=2, legend=False)

ax.set_title(df.attrs["title"], loc="left", size=10)

ax.set(
    xlabel="",
    ylabel=f"{df.attrs['attrs'][vn]['long_name']} [{df.attrs['attrs'][vn]['units']}]",
);
../_images/c1da4a42b54b1d82b551805025aaf441a1c810a176e88c64439a61d5ce44f829.png

Soil with xarray#

With uscrn.to_xarray(), we convert to xarray.Dataset.

ds = uscrn.to_xarray(df)

# Shorten notes so less space taken viewing the Dataset
lines = ds.attrs["notes"].splitlines()
for i, line in enumerate(lines):
    if line.lstrip().startswith("D."):
        break
else:
    i = 7
ds.attrs["notes"] = "\n".join(lines[:i] + [f"... ({len(lines) - i} more lines)"])

ds
<xarray.Dataset>
Dimensions:              (site: 155, time: 365, depth: 5)
Coordinates:
    wban                 (site) object '03047' '03048' ... '96408' '96409'
  * time                 (time) datetime64[ns] 2019-01-01 ... 2019-12-31
    longitude            (site) float32 -102.8 -106.9 -102.8 ... -150.9 -149.4
    latitude             (site) float32 31.62 34.36 33.96 ... 66.56 63.45 68.65
  * depth                (depth) float64 5.0 10.0 20.0 50.0 100.0
Dimensions without coordinates: site
Data variables: (12/16)
    crx_vn               (site, time) object '2.622' '2.622' ... '2.514' '2.514'
    t_daily_max          (site, time) float32 6.1 0.0 12.5 ... -26.4 -28.9 -29.4
    t_daily_min          (site, time) float32 -3.4 -3.2 -5.4 ... -32.8 -31.1
    t_daily_mean         (site, time) float32 1.3 -1.6 3.6 ... -30.4 -30.8 -30.3
    t_daily_avg          (site, time) float32 -1.5 -1.6 3.4 ... -30.7 -30.1
    p_daily_calc         (site, time) float32 0.0 0.0 0.0 0.0 ... 0.0 0.4 3.0
    ...                   ...
    sur_temp_daily_avg   (site, time) float32 -0.2 0.4 2.6 ... -32.3 -29.0 -28.8
    rh_daily_max         (site, time) float32 84.4 92.4 94.0 ... 75.1 81.4 79.0
    rh_daily_min         (site, time) float32 47.4 71.7 17.7 ... 51.6 68.7 68.6
    rh_daily_avg         (site, time) float32 75.6 81.8 54.7 ... 65.1 77.0 76.2
    soil_moisture_daily  (depth, site, time) float32 0.048 0.048 ... nan nan
    soil_temp_daily      (depth, site, time) float32 3.9 3.2 4.7 ... nan nan nan
Attributes:
    title:    U.S. Climate Reference Network (USCRN) | daily | 2019
    created:  2024-01-27 15:50:56.334536+00:00
    source:   https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01
    notes:    Notes from https://www.ncei.noaa.gov/pub/data/uscrn/products/da...
ds.soil_temp_daily.mean(dim="site", keep_attrs=True).plot.contourf(levels=20, size=4.5, aspect=2.5)

ax = plt.gca()
ax.invert_yaxis()

ax.set_title(f"{ds.title}\nAll-site average", loc="left", size=10)

ax.set(xlabel="");
../_images/29af4f4fe7df66de0ec35a214b25be5a99e251edbaadeae129ee23ea81e60d20.png