You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The long-term goal is to make a backend like this:
6
-
7
-
-**Jobs:** Generate some data or transform some other data. Currently it's a simple queue. In the future it could be scaled up to allow deferring to multiple backend nodes such as a cluster of GPUs, horde, etc.
8
-
-**Plugins:** handle installation for models and libraries and add API/jobs to use them. CLI utility to create a named plugin and instantly start working on it.
9
-
-**Server/Client:** Clients can be UIs designed for this backend, or bridge to other apps like blender nodes, kdenlive clips, effects, etc. Currently using flask with flask-sockio since it's very fast to use.
10
-
-**Package Manager Ecosystem:** Act as a package manager for AI, implement all your ideas and favorite models into stable-core to benefit from multiple GUIs and chain it with other community plugins, all designed for creative coding.
11
-
-**Instant Cloud Deploy:** runpod, vast.ai in just a few clicks. Paste in your SSH information to copy your configuration and your installation will automatically defer local jobs to the remote instance.
5
+
-**Server/Client Design:** Clients can be UIs designed for this backend, or bridge to other apps like blender nodes, kdenlive clips, effects, etc. Currently using flask with flask-sockio since it's very fast to use.
6
+
-**Job Management:** Generate some data or transform some other data. Currently it's a simple queue. In the future it could be scaled up to allow deferring to multiple backend nodes such as a cluster of GPUs, horde, etc.
7
+
-**Plugin Ecosystem:** Plugin is a wrapper around models, packages, techniques, features, etc. it handles all installation for its libraries and implements backend jobs. A CLI script wizard to instantly create a new plugin and start working on it. Acts a bit like a package manager for AI art, implement all your ideas and favorite models into stable-core to benefit from multiple GUIs and chain it with other community plugins, all designed for creative coding. Installation and repositories is all managed by each plugin, no need to think about this stuff anymore.
8
+
-**Cloud Deploy:** Instantly render on runpod, vast.ai in just a few clicks. Paste in your SSH information to copy your configuration and your installation will automatically be installed and your local jobs are deferred to the instance.
12
9
-**Multi-modal:** text, images, audio types as well. Each plugin job specifies the input and output so that we can transform the data around.
13
-
-**Simple:** whole backend core can be read in under in an hour.
14
-
- Built on tried and true AUTOMATIC1111 codebase
15
10
16
-
UIs can be written as clients, I will do DearImGUI, but gradio would be cool as well for colab.
17
11
Each plugin clearly announces its functions and parameters, so one generic UI drawer code to render them all.
18
12
The in/out parameters allow to create node UI to chain plugin jobs, a list macro, scripting logic, etc.
19
13
20
-
## Core/Plugin Refactor Progress - 10/19
14
+
## Contributions
15
+
16
+
I launch directly with `webui.sh` on linux. In Pycharm it also works to run `launch.py` for debugging but I think it's using my local installed packages instead of venv, not exactly sure but it works.
17
+
I've removed the webui-user scripts since we won't be doing CLI arguments anymore, at least not in a way you would want to save them for configuration. There didn't seem to be anything else important for end users in the webui-user script but we may wanna review this.
18
+
19
+
Contribution points for anyone who'd like to help.
20
+
21
+
-**Interactive Shell:** it would be cool to embed an interactive CLI interface into the server to use it without a UI, idk how to do this with flask though. (just using app.run() to launch it)
22
+
-**Plugins:** We already 'have' a bunch of plugins courtesy of AUTOMATIC1111, mainly upscalers. The code still needs to be ported for each of them. Then after that we can try to implement new ones.
23
+
-**UI:** we don't have a UI yet, I will write one in Dear ImGUI as soon as SD plugin is usable.
24
+
-**Authentication:** session system to connect with a passwords, ssh, etc. no sharing without this obviously.
25
+
-**Plugin Shell Script:**
26
+
- We need a CLI script to interact with plugins. (written in Python)
27
+
- Discoverery: Figure out how to host plugins on github and automatically collect them for listing.
28
+
- Creation: Create a new plugin, ready to work on it and push to a repository.
29
+
- Update: Update an existing plugin with git pull.
30
+
31
+
### Coding Standards
32
+
33
+
-**KISS:** We abid KISS, must be able to read and understood whole thing in under an hour. Always consider more than one approach, pick the simplest. As few moving parts as possible.
34
+
-**Documentation:** There is a severe lack of quality documentation in the world of programming. Long methods are fine, but add big header comments with titles. Check `launch.py` for recommended amount of documentation.
35
+
-**Stability:** Don't use exceptions for simple stuff. Fail gracefully with an error message and default value instead of throwing an exception anywhere we can expect the possible states. Avoid crashing as much as possible, we should try to keep the backend core running when maxing out VRAM, maybe we can run plugins on separate processes so the backend can keep running even if a plugin results in OOM.
36
+
-**Orthogonality:** Avoid global states as much as possible, emphasis on locality. For example don't do any saving or logging as part of a job, only push some progress and output data and let the specifics be handled externally. Don't pass some huge bags of options, e.g. if you have a plugin with an option object pass the individual values you need. If they're defaults, architecture the code such as to be able to post-process the values and apply defaults.
37
+
-**Unit Testing:** not planned for the first releases but test suites could certainly be useful, especially on individual plugins that might change a lot like StableDiffusionPlugin.
38
+
39
+
### Formatting
40
+
- 4 spaces indent
41
+
- Prefer Pathlib Path over filename strings
42
+
43
+
44
+
## Roadmap:
45
+
1.~Core backend components (server, jobs, plugins) to a usable state.~
46
+
2. Run the StableDiffusionPlugin txt2img job from CLI
47
+
3. Write a UI to run the job in and see progress.
48
+
4. Port some upscalers so we can see the job workflow in action.
49
+
50
+
## Plugin
51
+
52
+
Let me know if any other idea comes to mind
53
+
54
+
***StableDiffusion:** txt2img, img2img
55
+
***VQGAN+CLIP / PyTTI:** txt2img, img2img
56
+
***DiscoDiffusion:** txt2img, img2img
57
+
***CLIP Interrogate:** img2txt
58
+
***Dreambooth**: data2ckpt
59
+
***StyleGAN:** data2ckpt, img2img
60
+
***2D Transforms:** simple 2D transforms like translate, rotate, and scale.
61
+
***3D Transforms:** 3D transforms using virtual depth like rotating a sphere OR predicted depth from AdaBins+MiDaS. Could implement depth guidance to try and keep the depth more stable.
62
+
***Guidance:** these plugins guide the generation plugins.
63
+
***CLIP Guidance:** guidance using CLIP models.
64
+
***Lpips Guidance:** guidance using lpips
65
+
***Convolution Guidance:** guidance using convolutions. (edge_weight in PyTTI)
66
+
***Audio Analysis:** img2num, turn audio inputs into numbers for audio-reactivity, using FFT and stuff like that. Can maybe use Magenta.
67
+
***Palette Match:** img2img, adjust an image's palette to match an input image.
68
+
***Flow Warp:** img2img, displace an image using estimated flow between 2 input images.
69
+
***Prompt Wildcards:** txt2txt
70
+
***Whisper:** audio2txt
71
+
* Upscalers:
72
+
***RealSR:** img2img, on Linux this is easily installed thru AUR with `realsr-ncnn-vulkan`
73
+
***BasicSR:** img2img, port
74
+
***LDSR:** img2img
75
+
***CodeFormer:** img2img, port
76
+
***GFPGAN:** img2img, port
77
+
***MetaPlugin:** a plugin to string other plugins together, either with job macros or straight-up python. Could be done without a plugin but this allows all clients to automatically support these features.
78
+
79
+
## Progress Report - 10/20
80
+
81
+
- Server/Client design: ready. (really the the minimum)
82
+
- Plugins: ready. See the contribution section above to see what's left
83
+
- SD plugin: 75%, hypernetworks and textinv in refactoring.
84
+
- UI: starting as soon as SD plugin is done.
85
+
86
+
## Progress Report - 10/19
21
87
22
88
The server now boots up and we can import the StableDiffusion plugin, and even instantiate it without crashing.
23
89
The SD plugin processes are being refactored into the job system as JobParameters, which we can extend.
@@ -26,158 +92,16 @@ The ProcessResult had too many values being copied around. Instead we are now ke
26
92
So the plugin announces its job signatures like this: `name, function, input type, output type, parameter class`
27
93
Each invocation function returns one or multiple jobs, and each job has an associated param object to configure it.
28
94
29
-
A lot of useless UI shit mixed into the backend, we're mostly restarting from scartch for the gradio UI.
30
-
31
-
Contribution points:
32
-
33
-
- Obviously I am trying to get the SD plugin working first with img2img and txt2img jobs, then all the upscalers are mostly the same.
34
-
- It would be cool to embed a CLI interface into the server but idk how to do this with flask, I'm using app.run().
35
-
- Missing a UI and the Stable Diffusion plugin is in shambles because still refactoring. A lot of the API points are missing for a good UI
36
-
- Need to figure out sessions with connection methods like passwords, ssh, etc. otherwise anyone can request jobs if u share (lol)
37
-
- Need to figure out how we can get an efficient system where plugins are hosted on github and collect them for listing.
38
-
- Removing a lot of cli args and options and using job params where possible
95
+
A lot of useless UI shit mixed into the backend, we're mostly restarting from scratch for the gradio UI.
39
96
40
97
AUTOMATIC1111 is still not responding and I don't know any other way to contact him so don't know if we have him on-board. The project must be renamed to stable-core or something not stable-diffusion related.
41
98
42
-
## Core/Plugin Refactor Progress - 10/18
99
+
## Progress Report - 10/18
43
100
44
-
If you wish to contribute and speed things up, this is the current state of things:
101
+
Current state of things if you wish to contribute and speed things up:
45
102
46
103
- Many modules have been moved to plugins, they must be reviewed one by one and adapted into its Plugin class
47
-
- StableDiffusionPlugin is complex and broken up into several files.
48
-
- The processing stuff can probably stay.
49
-
- Sort out what is going on with the 'hijack' and 'hypernetwork' things, streamline that stuff
50
-
- We must exorcise the calls to `shared` across every plugin
104
+
- Exorcise all reference of `shared`, CLI args, and options.
51
105
-**Must figure out a real backend solution, not this gradio stuff**
52
-
-**Idk yet if the options stuff is compatible with the plugin architecture and how much needs refactoring**. I think it looks good and we can ask plugins to return an options_section(), but need to verify.
53
-
- We will probably rewrite the UI completely, old pieces can be adapted if necessary. Since we can move each plugin's UI into its on plugin file the UI will be a lot easier to improve in the future.
54
-
- There are more modules, some are just utility functions
55
-
56
-
## Features
57
-
[Detailed feature showcase with images](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features):
58
-
- Original txt2img and img2img modes
59
-
- One click install and run script (but you still must install python and git)
60
-
- Outpainting
61
-
- Inpainting
62
-
- Prompt Matrix
63
-
- Stable Diffusion Upscale
64
-
- Attention, specify parts of text that the model should pay more attention to
65
-
- a man in a ((tuxedo)) - will pay more attention to tuxedo
66
-
- a man in a (tuxedo:1.21) - alternative syntax
67
-
- select text and press ctrl+up or ctrl+down to automatically adjust attention to selected text (code contributed by anonymous user)
68
-
- Loopback, run img2img processing multiple times
69
-
- X/Y plot, a way to draw a 2 dimensional plot of images with different parameters
70
-
- Textual Inversion
71
-
- have as many embeddings as you want and use any names you like for them
72
-
- use multiple embeddings with different numbers of vectors per token
73
-
- works with half precision floating point numbers
74
-
- Extras tab with:
75
-
- GFPGAN, neural network that fixes faces
76
-
- CodeFormer, face restoration tool as an alternative to GFPGAN
77
-
- RealESRGAN, neural network upscaler
78
-
- ESRGAN, neural network upscaler with a lot of third party models
79
-
- SwinIR and Swin2SR([see here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2092)), neural network upscalers
80
-
- LDSR, Latent diffusion super resolution upscaling
81
-
- Resizing aspect ratio options
82
-
- Sampling method selection
83
-
- Adjust sampler eta values (noise multiplier)
84
-
- More advanced noise setting options
85
-
- Interrupt processing at any time
86
-
- 4GB video card support (also reports of 2GB working)
87
-
- Correct seeds for batches
88
-
- Prompt length validation
89
-
- get length of prompt in tokens as you type
90
-
- get a warning after generation if some text was truncated
91
-
- Generation parameters
92
-
- parameters you used to generate images are saved with that image
93
-
- in PNG chunks for PNG, in EXIF for JPEG
94
-
- can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI
95
-
- can be disabled in settings
96
-
- Settings page
97
-
- Running arbitrary python code from UI (must run with --allow-code to enable)
98
-
- Mouseover hints for most UI elements
99
-
- Possible to change defaults/mix/max/step values for UI elements via text config
100
-
- Random artist button
101
-
- Tiling support, a checkbox to create images that can be tiled like textures
102
-
- Progress bar and live image generation preview
103
-
- Negative prompt, an extra text field that allows you to list what you don't want to see in generated image
104
-
- Styles, a way to save part of prompt and easily apply them via dropdown later
105
-
- Variations, a way to generate same image but with tiny differences
106
-
- Seed resizing, a way to generate same image but at slightly different resolution
107
-
- CLIP interrogator, a button that tries to guess prompt from an image
108
-
- Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway
109
-
- Batch Processing, process a group of files using img2img
110
-
- Img2img Alternative
111
-
- Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions
112
-
- Reloading checkpoints on the fly
113
-
- Checkpoint Merger, a tab that allows you to merge two checkpoints into one
114
-
-[Custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts) with many extensions from community
115
-
-[Composable-Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/), a way to use multiple prompts at once
116
-
- separate prompts using uppercase `AND`
117
-
- also supports weights for prompts: `a cat :1.2 AND a dog AND a penguin :2.2`
118
-
- No token limit for prompts (original stable diffusion lets you use up to 75 tokens)
119
-
- DeepDanbooru integration, creates danbooru style tags for anime prompts (add --deepdanbooru to commandline args)
120
-
-[xformers](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers), major speed increase for select cards: (add --xformers to commandline args)
121
-
122
-
## Installation and Running
123
-
Make sure the required [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) are met and follow the instructions available for both [NVidia](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs) (recommended) and [AMD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) GPUs.
124
-
125
-
Alternatively, use Google Colab:
126
-
127
-
-[Colab, maintained by Akaibu](https://colab.research.google.com/drive/1kw3egmSn-KgWsikYvOMjJkVDsPLjEMzl)
128
-
-[Colab, original by me, outdated](https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh).
129
-
130
-
### Automatic Installation on Windows
131
-
1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH"
3. Download the stable-diffusion-webui repository, for example by running `git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git`.
134
-
4. Place `model.ckpt` in the `models` directory (see [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) for where to get it).
135
-
5._*(Optional)*_ Place `GFPGANv1.4.pth` in the base directory, alongside `webui.py` (see [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) for where to get it).
136
-
6. Run `webui-user.bat` from Windows Explorer as normal, non-administrator, user.
137
-
138
-
### Automatic Installation on Linux
139
-
1. Install the dependencies:
140
-
```bash
141
-
# Debian-based:
142
-
sudo apt install wget git python3 python3-venv
143
-
# Red Hat-based:
144
-
sudo dnf install wget git python3
145
-
# Arch-based:
146
-
sudo pacman -S wget git python3
147
-
```
148
-
2. To install in `/home/$(whoami)/stable-diffusion-webui/`, run:
- DeepDanbooru - interrogator for anime diffusers https://github.com/KichangKim/DeepDanbooru
182
-
- Initial Gradio script - posted on 4chan by an Anonymous user. Thank you Anonymous user.
183
-
- (You)
106
+
- We will probably rewrite the UI completely, old pieces can be adapted if necessary. Since we can move each plugin's UI into its own plugin file the UI will be a lot easier to improve in the future.
107
+
- There are more modules remaining, some are just utility functions
0 commit comments