Select sites

Select sites#

With uscrn.get_data(), it is possible to select sites, to avoid downloading data you don’t want.

import matplotlib.pyplot as plt

import uscrn

We can use the site metadata info from uscrn.load_meta() to determine which sites we want to load.

meta = uscrn.load_meta()
meta.head()
wban country state location vector name latitude longitude elevation status commissioning closing operation pairing network station_id
0 03047 US TX Monahans 6 ENE Sandhills State Park 31.62 -102.80 2724.0 Commissioned 2004-01-12 NaT Operational NaN USCRN 1019
1 03048 US NM Socorro 20 N Sevilleta National Wildlife Refuge (LTER Site) 34.35 -106.88 4847.0 Commissioned 2004-01-12 NaT Operational NaN USCRN 1020
2 03054 US TX Muleshoe 19 S Muleshoe National Wildlife Refuge (Headquarter... 33.95 -102.77 3742.0 Commissioned 2004-04-23 NaT Operational NaN USCRN 1067
3 03055 US OK Goodwell 2 E OK Panhandle Research & Extn. Center (Native ... 36.59 -101.59 3266.0 Commissioned 2004-04-23 NaT Operational NaN USCRN 1068
4 03060 US CO Montrose 11 ENE Black Canyon of the Gunnison National Park (Ve... 38.54 -107.69 8402.0 Commissioned 2004-09-08 NaT Operational NaN USCRN 1109
meta.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 255 entries, 0 to 254
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   wban           238 non-null    object        
 1   country        255 non-null    object        
 2   state          255 non-null    object        
 3   location       255 non-null    object        
 4   vector         255 non-null    object        
 5   name           255 non-null    object        
 6   latitude       255 non-null    float64       
 7   longitude      255 non-null    float64       
 8   elevation      254 non-null    float64       
 9   status         255 non-null    object        
 10  commissioning  151 non-null    datetime64[ns]
 11  closing        79 non-null     datetime64[ns]
 12  operation      255 non-null    object        
 13  pairing        14 non-null     object        
 14  network        255 non-null    object        
 15  station_id     255 non-null    object        
dtypes: datetime64[ns](2), float64(3), object(11)
memory usage: 32.0+ KB

Single site#

(
    meta.query("state == 'CO' and operation == 'Operational'")
    .sort_values(by="location")
)
wban country state location vector name latitude longitude elevation status commissioning closing operation pairing network station_id
212 94075 US CO Boulder 14 W Mountain Research Station INSTAAR Univ. of CO ... 40.03 -105.54 9828.0 Commissioned 2004-01-12 NaT Operational NaN USCRN 1045
5 03061 US CO Cortez 8 SE Mesa Verde National Park (Far View Site) 37.25 -108.50 8034.0 Commissioned 2006-01-06 NaT Operational NaN USCRN 1232
218 94082 US CO Dinosaur 2 E Dinosaur National Monument (Hdq. Maintenance S... 40.24 -108.96 6062.0 Commissioned 2004-09-08 NaT Operational NaN USCRN 1108
7 03063 US CO La Junta 17 WSW USDA Comanche National Grassland 37.86 -103.82 4386.0 Commissioned 2004-09-08 NaT Operational NaN USCRN 1110
4 03060 US CO Montrose 11 ENE Black Canyon of the Gunnison National Park (Ve... 38.54 -107.69 8402.0 Commissioned 2004-09-08 NaT Operational NaN USCRN 1109
211 94074 US CO Nunn 7 NNE Ag. Res. Svc. Central Plains Exp. Range (SGS L... 40.80 -104.75 5390.0 Commissioned 2004-01-12 NaT Operational NaN USCRN 1014
%%time

station_id = "1045"  # Boulder, CO

assert meta.station_id.nunique() == len(meta)
assert meta.set_index("station_id").at[station_id, "location"] == "Boulder"

df = uscrn.get_data(range(2015, 2025), "daily", station_id=station_id, n_jobs=2)

Hide code cell output

Discovering files...
10 file(s) found
https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2015/CRND0103-2015-CO_Boulder_14_W.txt
...
https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2024/CRND0103-2024-CO_Boulder_14_W.txt
Reading files...
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    1.6s
[Parallel(n_jobs=2)]: Done   4 tasks      | elapsed:    1.9s
CPU times: user 474 ms, sys: 38.2 ms, total: 512 ms
Wall time: 3.78 s
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    2.6s finished
df
wban lst_date crx_vn longitude latitude t_daily_max t_daily_min t_daily_mean t_daily_avg p_daily_calc ... soil_moisture_5_daily soil_moisture_10_daily soil_moisture_20_daily soil_moisture_50_daily soil_moisture_100_daily soil_temp_5_daily soil_temp_10_daily soil_temp_20_daily soil_temp_50_daily soil_temp_100_daily
0 94075 2015-01-01 2.423 -105.540001 40.040001 -6.7 -20.6 -13.7 -13.4 0.0 ... NaN NaN NaN NaN NaN -3.1 -2.2 NaN NaN NaN
1 94075 2015-01-02 2.423 -105.540001 40.040001 -3.0 -16.5 -9.7 -8.9 0.0 ... NaN NaN NaN NaN NaN -2.8 -2.1 NaN NaN NaN
2 94075 2015-01-03 2.423 -105.540001 40.040001 -4.2 -13.0 -8.6 -8.5 1.6 ... NaN NaN NaN NaN NaN -2.7 -2.1 NaN NaN NaN
3 94075 2015-01-04 2.423 -105.540001 40.040001 -4.8 -13.0 -8.9 -8.4 2.0 ... NaN NaN NaN NaN NaN -3.1 -2.2 NaN NaN NaN
4 94075 2015-01-05 2.423 -105.540001 40.040001 0.8 -5.0 -2.1 -1.4 2.0 ... NaN NaN NaN NaN NaN -2.8 -2.3 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3648 94075 2024-12-27 2.623 -105.540001 40.040001 -3.5 -9.4 -6.5 -6.5 8.4 ... NaN NaN NaN NaN NaN -2.3 -1.7 NaN NaN NaN
3649 94075 2024-12-28 2.623 -105.540001 40.040001 -2.2 -6.2 -4.2 -4.1 9.1 ... NaN NaN NaN NaN NaN -2.3 -1.6 NaN NaN NaN
3650 94075 2024-12-29 2.623 -105.540001 40.040001 2.0 -3.1 -0.5 -0.3 11.3 ... NaN NaN NaN NaN NaN -1.9 -1.4 NaN NaN NaN
3651 94075 2024-12-30 2.623 -105.540001 40.040001 -2.8 -13.1 -8.0 -8.6 4.5 ... NaN NaN NaN NaN NaN -1.7 -1.2 NaN NaN NaN
3652 94075 2024-12-31 2.623 -105.540001 40.040001 -8.1 -19.1 -13.6 -13.0 0.2 ... NaN NaN NaN NaN NaN -2.4 -1.7 NaN NaN NaN

3653 rows × 28 columns

vn = "t_daily_max"

attrs = df.attrs["attrs"][vn]
s = df.set_index("lst_date")[vn]

_, ax = plt.subplots(figsize=(9, 4))
s.plot(ax=ax, lw=0.5, alpha=0.35, color="C0")
s.rolling("30D").mean().plot(ax=ax, color="C0")
ax.set_xlabel("")
ax.set_ylabel(f"{attrs['long_name']}\n[{attrs['units']}]");
../_images/8a4142752db138ff8f05775dea5865fa793924c4c73f5cb48a2b0c0b803e8eb8.png

Sites in a state#

%%time

station_ids = meta.query("state == 'TX'").station_id.tolist()
print(station_ids)

df = uscrn.get_data(2023, "daily", station_id=station_ids, n_jobs=2)

Hide code cell output

['1019', '1067', '1130', '1066', '1306', '1387', '1386', '1018']
Discovering files...
8 file(s) found
https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2023/CRND0103-2023-TX_Austin_33_NW.txt
...
https://www.ncei.noaa.gov/pub/data/uscrn/products/daily01/2023/CRND0103-2023-TX_Port_Aransas_32_NNE.txt
Reading files...
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done   4 tasks      | elapsed:    0.5s
[Parallel(n_jobs=2)]: Done   6 out of   8 | elapsed:    0.8s remaining:    0.3s
CPU times: user 171 ms, sys: 8.63 ms, total: 180 ms
Wall time: 1.68 s
[Parallel(n_jobs=2)]: Done   8 out of   8 | elapsed:    1.0s finished
df
wban lst_date crx_vn longitude latitude t_daily_max t_daily_min t_daily_mean t_daily_avg p_daily_calc ... soil_moisture_5_daily soil_moisture_10_daily soil_moisture_20_daily soil_moisture_50_daily soil_moisture_100_daily soil_temp_5_daily soil_temp_10_daily soil_temp_20_daily soil_temp_50_daily soil_temp_100_daily
0 23907 2023-01-01 2.623 -98.080002 30.620001 25.100000 12.3 18.700001 18.500000 0.0 ... 0.309 0.334 NaN NaN NaN 13.7 13.5 NaN NaN NaN
1 23907 2023-01-02 2.623 -98.080002 30.620001 25.799999 16.4 21.100000 20.799999 0.0 ... 0.312 0.334 NaN NaN NaN 15.8 15.2 NaN NaN NaN
2 23907 2023-01-03 2.623 -98.080002 30.620001 21.299999 12.1 16.700001 16.000000 0.0 ... 0.306 0.333 NaN NaN NaN 14.5 15.0 NaN NaN NaN
3 23907 2023-01-04 2.623 -98.080002 30.620001 20.600000 9.2 14.900000 14.400000 0.0 ... 0.299 0.330 NaN NaN NaN 13.3 14.1 NaN NaN NaN
4 23907 2023-01-05 2.623 -98.080002 30.620001 20.000000 5.1 12.600000 12.100000 0.0 ... 0.294 0.327 NaN NaN NaN 12.4 13.3 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2915 23906 2023-12-27 2.622 -96.820000 28.299999 21.600000 2.1 11.800000 11.100000 0.0 ... 0.078 0.079 0.039 0.062 0.081 15.0 15.8 17.200001 18.000000 19.299999
2916 23906 2023-12-28 2.622 -96.820000 28.299999 16.600000 0.6 8.600000 7.900000 0.0 ... 0.077 0.078 0.039 0.062 0.081 13.4 14.5 16.500000 17.400000 19.100000
2917 23906 2023-12-29 2.622 -96.820000 28.299999 17.700001 -0.4 8.600000 7.500000 0.0 ... 0.075 0.076 0.039 0.062 0.081 12.6 13.6 15.700000 16.799999 18.700001
2918 23906 2023-12-30 2.622 -96.820000 28.299999 21.400000 -1.2 10.100000 8.600000 0.0 ... 0.074 0.075 0.039 0.062 0.081 12.0 13.0 15.000000 16.200001 18.299999
2919 23906 2023-12-31 2.622 -96.820000 28.299999 23.799999 2.0 12.900000 13.900000 0.0 ... 0.073 0.074 0.040 0.062 0.080 13.2 13.5 14.800000 15.800000 17.900000

2920 rows × 28 columns

vn = "t_daily_max"

attrs = df.attrs["attrs"][vn]

(
    df.assign(rounded_latitude=df.latitude.round(1).astype(str))
    .boxplot(vn, by="rounded_latitude")
)
plt.gca().set_ylabel(f"{attrs['long_name']}\n[{attrs['units']}]");
../_images/43c7c98d534971fb8aa2b1b64f6d043d053dabfde3876c0dd67e747d19365bad.png