mlops/ROADMAP.md

2.9 KiB

the basic path

  • make a jupyterlab docker container!

    • volume mount for notebooks
    • volume for training data?
  • get a model working in jupyter notebook

  • make a dockerfile to set up the base image

  • script to train and validate the model, packaging up the trained model alongside a report of its accuracy etc.

    • include the package requirements in the artifact
  • taking that trained model and putting it in a flask or fastAPI container

  • how about using my laptop-docker-compose pytest script thing? that's pretty great for what i need

next steps

  • get the data into redis for training -- or another data store?
  • how about a clean path from jupyter to docker?
  • CI/CD: get it working with git and webhooks
  • git workflow with webhooks
    • what is a good project model? here's a guess:
      • base dockerfiles for conda and build scripts (?)
      • model definition
      • dataset definitions
      • rest API
      • test container
  • whylogs to track model drift?

questions

  • how do we get from jupyter to docker? what's a clean path?

  • what's a good practice for making training data available? Just have a redis container?

  • does it make sense to have each component be a separate CI/CD managed project?

    • data ingest (for each dataset?)
    • feature extraction
    • labeling
    • jupyter notebook for experimentation
    • converter for jupyter-> app model (or do we just rely on people copypasting)
    • model trainer (+verification?)
    • maybe we just define a production application and worry about the rest later?
      • model + API (or messaging system)
  • what's a good way to get project dependencies into a jupyter container?

  • does a setup like above help with parameter optimization?

    • should an optimizer be its own containerized service?
  • how can we reuse feature extraction for regular model input?

    • should a feature extractor be a service?

What needs doing?

  • step one is writing out all these steps!

  • docker image build pipeline:

    • miniconda image
    • how do we pull in the requirements for a particular model?
      • honestly just ship a Dockerfile alongside the model
        • but it needs to pull from a manifest that an individual developer using something like jupyter would use
        • verify this part of the pipeline according to the training materials
  • feature extraction

  • training phase

  • packaging up a pre-trained model artifact

  • CI/CD [ ] webhook plugin for gitea? https://github.com/adnanh/webhook

  • missing from the course????

    • training data store!
    • labeling phase (is that the right term?) -- unattended?
    • feature extraction phase

refer: