2.9 KiB
the basic path
-
make a jupyterlab docker container!
- volume mount for notebooks
- volume for training data?
-
get a model working in jupyter notebook
-
make a dockerfile to set up the base image
-
script to train and validate the model, packaging up the trained model alongside a report of its accuracy etc.
- include the package requirements in the artifact
-
taking that trained model and putting it in a flask or fastAPI container
-
how about using my laptop-docker-compose pytest script thing? that's pretty great for what i need
next steps
- get the data into redis for training -- or another data store?
- how about a clean path from jupyter to docker?
- CI/CD: get it working with git and webhooks
- git workflow with webhooks
- what is a good project model? here's a guess:
- base dockerfiles for conda and build scripts (?)
- model definition
- dataset definitions
- rest API
- test container
- what is a good project model? here's a guess:
- whylogs to track model drift?
questions
-
how do we get from jupyter to docker? what's a clean path?
-
what's a good practice for making training data available? Just have a redis container?
-
does it make sense to have each component be a separate CI/CD managed project?
- data ingest (for each dataset?)
- feature extraction
- labeling
- jupyter notebook for experimentation
- converter for jupyter-> app model (or do we just rely on people copypasting)
- model trainer (+verification?)
- maybe we just define a production application and worry about the rest later?
- model + API (or messaging system)
-
what's a good way to get project dependencies into a jupyter container?
-
does a setup like above help with parameter optimization?
- should an optimizer be its own containerized service?
-
how can we reuse feature extraction for regular model input?
- should a feature extractor be a service?
What needs doing?
-
step one is writing out all these steps!
-
docker image build pipeline:
- miniconda image
- how do we pull in the requirements for a particular model?
- honestly just ship a Dockerfile alongside the model
- but it needs to pull from a manifest that an individual developer using something like jupyter would use
- verify this part of the pipeline according to the training materials
- honestly just ship a Dockerfile alongside the model
-
feature extraction
-
training phase
-
packaging up a pre-trained model artifact
-
CI/CD [ ] webhook plugin for gitea? https://github.com/adnanh/webhook
-
missing from the course????
- training data store!
- labeling phase (is that the right term?) -- unattended?
- feature extraction phase