mlops/ROADMAP.md


## the basic path

- [X] make a jupyterlab docker container!
    - [ ] volume mount for notebooks
    - [ ] volume for training data?

- [ ] get a model working in jupyter notebook
- [ ] make a dockerfile to set up the base image
- [ ] script to train and validate the model, packaging up the trained model alongside a report of its accuracy etc.
    - [ ] include the package requirements in the artifact

- [ ] taking that trained model and putting it in a flask or fastAPI container

- [ ] how about using my laptop-docker-compose pytest script thing? that's pretty great for what i need

## next steps

- [ ] get the data into redis for training -- or another data store?
- [ ] how about a clean path from jupyter to docker?
- [ ] CI/CD: get it working with git and webhooks
- [ ] git workflow with webhooks
    - [ ] what is a good project model?  here's a guess:
        - base dockerfiles for conda and build scripts (?)
        - model definition
        - dataset definitions
        - rest API
        - test container
- [ ] whylogs to track model drift?

## questions


- [ ] how do we get from jupyter to docker?  what's a clean path?
- [ ] what's a good practice for making training data available?  Just have a redis container?

- [ ] does it make sense to have each component be a separate CI/CD managed project?
    - data ingest (for each dataset?)
    - feature extraction
    - labeling
    - jupyter notebook for experimentation
    - converter for jupyter-> app model  (or do we just rely on people copypasting)
    - model trainer (+verification?)
    - [ ] maybe we just define a production application and worry about the rest later?
        - model + API (or messaging system)
        - 

- [ ] what's a good way to get project dependencies into a jupyter container?

- [ ] does a setup like above help with parameter optimization?
    - [ ] should an optimizer be its own containerized service?

- [ ] how can we reuse feature extraction for regular model input?
    - [ ] should a feature extractor be a service?


## What needs doing?

- [ ] step one is writing out all these steps!

- [ ] docker image build pipeline:
    - [ ] miniconda image
    - [ ] how do we pull in the requirements for a particular model?
        - honestly just ship a Dockerfile alongside the model
            - but it needs to pull from a manifest that an individual developer using something like jupyter would use
            - [ ] verify this part of the pipeline according to the training materials

- [ ] feature extraction

- [ ] training phase
    

- [ ] packaging up a pre-trained model artifact

- [ ] CI/CD
    [ ] webhook plugin for gitea?  https://github.com/adnanh/webhook


- [ ] missing from the course????
    - [ ] training data store!
    - [ ] labeling phase (is that the right term?) -- unattended?
    - [ ] feature extraction phase

## refer:

- https://github.com/docker-science/cookiecutter-docker-science?tab=readme-ov-file
- https://docker-science.github.io/
Initial version -- can create a jupyter lab container 2024-02-19 19:48:10 -08:00
			`## the basic path`

			`- [X] make a jupyterlab docker container!`
			`- [ ] volume mount for notebooks`
			`- [ ] volume for training data?`

			`- [ ] get a model working in jupyter notebook`
			`- [ ] make a dockerfile to set up the base image`
			`- [ ] script to train and validate the model, packaging up the trained model alongside a report of its accuracy etc.`
			`- [ ] include the package requirements in the artifact`

			`- [ ] taking that trained model and putting it in a flask or fastAPI container`

			`- [ ] how about using my laptop-docker-compose pytest script thing? that's pretty great for what i need`

			`## next steps`

			`- [ ] get the data into redis for training -- or another data store?`
			`- [ ] how about a clean path from jupyter to docker?`
			`- [ ] CI/CD: get it working with git and webhooks`
			`- [ ] git workflow with webhooks`
			`- [ ] what is a good project model? here's a guess:`
			`- base dockerfiles for conda and build scripts (?)`
			`- model definition`
			`- dataset definitions`
			`- rest API`
			`- test container`
			`- [ ] whylogs to track model drift?`

			`## questions`


			`- [ ] how do we get from jupyter to docker? what's a clean path?`
			`- [ ] what's a good practice for making training data available? Just have a redis container?`

			`- [ ] does it make sense to have each component be a separate CI/CD managed project?`
			`- data ingest (for each dataset?)`
			`- feature extraction`
			`- labeling`
			`- jupyter notebook for experimentation`
			`- converter for jupyter-> app model (or do we just rely on people copypasting)`
			`- model trainer (+verification?)`
			`- [ ] maybe we just define a production application and worry about the rest later?`
			`- model + API (or messaging system)`
			`-`

			`- [ ] what's a good way to get project dependencies into a jupyter container?`

			`- [ ] does a setup like above help with parameter optimization?`
			`- [ ] should an optimizer be its own containerized service?`

			`- [ ] how can we reuse feature extraction for regular model input?`
			`- [ ] should a feature extractor be a service?`


			`## What needs doing?`

			`- [ ] step one is writing out all these steps!`

			`- [ ] docker image build pipeline:`
			`- [ ] miniconda image`
			`- [ ] how do we pull in the requirements for a particular model?`
			`- honestly just ship a Dockerfile alongside the model`
			`- but it needs to pull from a manifest that an individual developer using something like jupyter would use`
			`- [ ] verify this part of the pipeline according to the training materials`

			`- [ ] feature extraction`

			`- [ ] training phase`


			`- [ ] packaging up a pre-trained model artifact`

			`- [ ] CI/CD`
			`[ ] webhook plugin for gitea? https://github.com/adnanh/webhook`


			`- [ ] missing from the course????`
			`- [ ] training data store!`
			`- [ ] labeling phase (is that the right term?) -- unattended?`
			`- [ ] feature extraction phase`

			`## refer:`

			`- https://github.com/docker-science/cookiecutter-docker-science?tab=readme-ov-file`
			`- https://docker-science.github.io/`