Bumping this. See my thread.
Ooba's textgen UI --> Sillytavern API
(or, directly into the Ooba API)
You can run local models uncensored, and get exactly what you want for super deepthroat. With Sillytavern, it can remember context of what is happening in the scene, and it's possible to give a scenario, among other things.
The main work would be a SDT-engine that is:
1) Time-based, meaning, a time-line that would make a new API request every, i.e., 10 seconds. Timer would be paused when generating and awaiting the result from the API, or, it could stream the tokens back to the application as they're generated
2) Would enable voice-creation via an online voice engine, or, a locally hosted voice model
3) Could be trained on SDT actions including Dialogue actions, to make the scene change dynamically, assuming creators had a way to provide some sort of "repository" that could be quickly queried and chosen from, whether by AI or by the scene.