Matthew Epland, PhD
This project provides live water pressure measurements via a web dashboard running on a Raspberry Pi, logs the data, and creates time series forecasts of future water pressure.
Living in a 5th floor walk up in NYC can save you on rent and gym memberships,
but runs the risk of leaving you high and dry when your water pressure gives out!
The pressure delivered from the city’s water mains is
typically
sufficient to reach the 6th floor,
with higher buildings needing a booster pump and one of NYC’s iconic rooftop
water towers.
My building lacks a pump and water tower,
leaving my top floor apartment with just barely satisfactory pressure,
as long as no other units are using water!
As you can see in the data below,
my daytime water pressure is all over the place.
After being stranded soapy and cold halfway through a shower one too many times,
I decided to use my data science and electronics skills to record
the time series of my apartment’s hot water pressure
with the goal of forecasting future availability,
and hence chance_of_showers
was born!
WIP
Below is a sample of the pressure data collected in November 2023. Clicking the links will open interactive plotly plots, please explore!
The data acquisition (DAQ) system saves the raw pressure data from the analog to digital converter (ADC) as an integer between 0 and 65472. Note that occasionally a water hammer will increase the pressure above its steady state value, marked by the orange 100% reference line, with a subsequent decay on the order of 10 minutes. When water is flowing at the pressure sensor, the data is shown with an open purple marker. Using water reduces the pressure slightly under normal conditions, and abruptly ends overpressure events.
To clean the data before fitting any models, I rescale the values to 0 and 1 between the steady state extrema. Any values that are outside the normalization range are capped.
| | | | | :—: | :—: | :—: |
| | | | :—: | :—: | | | |
Here is a list of the components I used in my build. With suitable alterations, the project could definitely be carried out with a wide array of other sensors, single board computers or microcontrollers, plumbing supplies, etc.
The circuit diagram for this implementation is provided as a KiCad schematic here.
| | | | | | :—: | :—: | :—: | :—: | | | | | | | | | | | | | | | |
The DAQ system recorded 95.4% of possible data points overall, and 99.870% since implementing the cron job heartbeat monitoring.
The provided start_daq
bash script
will start the daq.py
and fan_control.py
scripts in new tmux
windows.
You will need to update the pkg_path
variable in start_daq
per your installation location.
source daq/start_daq
If daq: {display_web: true}
is set in config.yaml
,
the local IP address and port of the dashboard will be logged on DAQ startup.
Open this link in your browser to see the live dashboard, as shown in the introduction.
Jobs to restart the DAQ on boot and every 30 minutes,
as well as send heartbeat API calls - see below,
are provided in the cron_jobs.txt
file.
Note that loading this file with crontab
will overwrite any current cron jobs,
so check your existing settings first with crontab -l
!
crontab -l
crontab daq/cron_jobs.txt
You can verify the cron jobs are running as expected with:
grep CRON /var/log/syslog | grep $LOGNAME
You can use the provided heartbeat
bash script to send heartbeat API calls
for the DAQ script to healthchecks.io for monitoring and alerting.
Configure your alert online at healthchecks.io,
and then run the below commands to setup a secrets.json
file with your alert’s uuid
.
You will need to update the pkg_path
variable in heartbeat
per your installation location.
The provided cron_jobs.txt
will setup a cron job
to send the heartbeat on the 15 and 45 minute of each hour.
sudo apt install -y jq
echo -e "{\n\t\"chance_of_showers_heartbeat_uuid\": \"YOUR_UUID_HERE\"\n}" > secrets.json
source daq/heartbeat
The heartbeat
script has also been setup to backup
the daq/raw_data
and daq/saved_data
directories to
backup_path="/media/usb_drive/daq_backup"
.
Please configure backup_path
to fit your path, or comment out the rsync
lines to turn them off.
Regular backups of the data to a separate drive are helpful as Raspberry Pis
have been known to corrupt their SD cards due to power loss or excessive writes.
Raw CSV files can be combined into convenient Parquet files
prior to analysis with the etl.py
script.
If the script crashes, you may need to manually repair
any lines in the CSV files corrupted due to power losses.
Polars should generate error messages indicating
the corrupt datetime to help you locate the problematic file and line.
python daq/etl.py
To optimize the many hyperparameters present in this project,
both of the individual forecasting models themselves as well as how the data is prepared,
Bayesian optimization
was used to efficiently sample the parameter space.
The functions needed to run Bayesian optimization
are located in bayesian_opt.py
,
and make use of the
bayesian-optimization
library.
Unfortunately, actually running the optimization over GPU accelerated models
is not as simple as calling the run_bayesian_opt()
function.
I have been unable to successfully detach the training of one GPU accelerated model
from the next when training multiple models in a loop.
The second training session will still have access to the tensors of the first,
leading to out of GPU memory errors, even when
using commands like gc.collect()
and torch.cuda.empty_cache()
.
The torch
models created by darts
are very convenient,
but do not provide as much configurability as building your own torch
model from scratch,
leaving me unable to fix this issue in a clean way.
To work around the GPU memory issues, a shell script,
start_bayesian_opt
, is used to repeatedly call run_bayesian_opt()
via the bayesian_opt_runner.py
script.
In this way each model is trained in its own Python session,
totally clearing memory between training iterations.
A signed pickle file is used to quickly load the necessary data and settings on each iteration.
Instructions for running the whole Bayesian optimization workflow are provided below.
Some hyperparameter points chosen for testing by the optimizer result in crashes during training,
either due to memory limitations, or invalid parameter combinations that slip by pre-run checks.
In most cases these exceptions can be caught within Python,
allowing the point to be automatically logged as having the worst possible BAD_TARGET = -999.0
.
However, the Python process itself is occasionally killed
by the operating system with an uncatchable SIGKILL
signal,
likely due to a request for too much memory.
Resuming the run will only result in repeated crashes on the same point as it is never written to disk.
Rather than rework the bayesian-optimization
library to pre-register points before optimization,
a manual_bad_point.py
script is included
to easily log the few such points by hand.
parent_wrapper.pickle
file for bayesian_opt_runner.py
via the exploratory_ana.py
notebook.start_bayesian_opt
and bayesian_opt_runner.py
.manual_bad_point.py
and resume the search../ana/start_bayesian_opt 2>&1 | tee ana/models/bayesian_optimization/bayesian_opt.log
11.8.0
) for the current release of PyTorch (2.0.1
) here.pytorch-gpu-src
source to point to the correct PyTorch version in pyproject.toml
.
pip install --index-url=...
as provided by the PyTorch installation instructions.ana
group with make setupANA
.
pytorch
, along with the other necessary packages.python
commands:import torch
if torch.cuda.is_available():
print("CUDA is available")
print(f"Device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
else:
print("CUDA IS NOT AVAILABLE!")
If python 3.11
is not available in your release of Raspbian,
you can compile it from source following the instructions here,
but will also need to install the sqlite extensions:
cd /usr/src/
sudo wget https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz
sudo tar -xzvf Python-3.11.4.tgz
cd Python-3.11.4/
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y build-essential libbz2-dev libc6-dev libexpat1-dev libffi-dev libgdbm-dev liblzma-dev libncurses5-dev libnss3-dev libsqlite3-dev libssl-dev lzma pkg-config zlib1g-dev
sudo apt autoremove -y
sudo apt update && sudo apt full-upgrade -y
./configure --enable-optimizations --enable-loadable-sqlite-extensions
sudo make altinstall
# Should be Python 3.11.4 with your compile info
/usr/local/bin/python3.11 -VV
# Link binary
sudo rm /usr/bin/python
sudo rm /usr/bin/python3
sudo ln -s /usr/local/bin/python3.11 /usr/bin/python
sudo ln -s /usr/local/bin/python3.11 /usr/bin/python3
# Should match /usr/local/bin/python3.11 -VV
python -VV
To finish setting up the DAQ system you must also:
tmux
, which is not included in Raspbian by default.
tmux
is used to control multiple terminal sessions in start_daq
.pigpio
, which is not included in Raspbian Lite, i.e. headless, installations.
pigpio
is necessary to interface with the GPIO ports and must also be enabled via a daemonraspi-config
.log2ram
to avoid unnecessary writes to the SD card, prolonging the card’s lifetime.# Install tmux and pigpio
sudo apt-get install -y tmux pigpio
# Enable SPI, I2C, and Remote GPIO
sudo raspi-config
# Setup pigpio daemon
sudo systemctl enable pigpiod
# Prevent the WiFi from powering off
# Above the line that says exit 0 insert `/sbin/iw wlan0 set power_save off` and save the file
sudo vi /etc/rc.local
# Install log2ram
echo "deb [signed-by=/usr/share/keyrings/azlux-archive-keyring.gpg] http://packages.azlux.fr/debian/ bullseye main" | sudo tee /etc/apt/sources.list.d/azlux.list
sudo wget -O /usr/share/keyrings/azlux-archive-keyring.gpg https://azlux.fr/repo.gpg
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y log2ram
Install poetry
following the instructions here.
curl -sSL https://install.python-poetry.org | python3 -
Then install the python
packages needed for this installation. Groups include:
daq
for packages needed to run the DAQ script on a Raspberry Pi, optionalweb
for packages needed to run the live dashboard from the DAQ script, optionalana
for analysis tools, optionaldev
for continuous integration (CI) and linting toolspoetry install --with daq,web
or
poetry install --with ana
It is recommended to use the pre-commit
tool
to automatically check your commits locally as they are created.
You should just need to install the git hook scripts,
see below, after installing the dev
dependencies.
This will run the checks in .pre-commit-config.yaml
when you create a new commit.
pre-commit install
Markdown is linted using markdownlint-cli
,
JavaScript by standard
,
and HTML, SCSS, CSS, and TOML by prettier
.
You can install these JavaScript-based linters globally with:
sudo npm install --global markdownlint-cli standard prettier
sudo npm install --global --save-dev --save-exact prettier-plugin-toml
Shell files are linted using shellcheck
and shfmt
.
Follow the linked installation instructions for your system.
On Fedora they are:
sudo dnf install ShellCheck shfmt
A Makefile
is provided for convenience,
with commands to setup the DAQ and analysis environments,
make setupDAQ
and make setupANA
,
as well run CI and linting tools,
e.g. make black
, make pylint
, make pre-commit
.