Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regional prompting / Attention couple proof of concept #639

Merged
merged 108 commits into from
Jun 3, 2024

Conversation

Danamir
Copy link
Contributor

@Danamir Danamir commented Apr 21, 2024

Hello,

Using the custom node cgem156-ComfyUI , I was able to have a functional regional prompting workflow for SDXL.

Here is a proof of concept in krita-ai-diffusion, using a new control layer type "Attention", and splitting the prompt in lines and parsing for ZONE starting lines (or BREAK, PROMPT, ATT, and an optional number... TBD) and new Paint layer in Krita we are able to greatly influence the rendering.

Here is a step by step :

  • Create a new Paint layer in Krita
  • Draw/fill the region you want to control (any color, any opacity, we use the transparency mask directly)
  • Add a new "Attention" control layer in krita-ai-diffusion interface
  • Affect the new Paint layer to the control layer
  • Repeat for any number of zones
  • Alter the prompt:
    • Add as many lines as zones starting by ZONE then describe the content
    • Optionally add an extra ZONE line that will be applied only to the image outside the defined zones
    • The first line of the prompt is copied at the beginning of all prompts
    • The last line of the prompt is copied at the end of all prompts

An example with a single zone :

photography of
ZONE a dog
ZONE a cat
two animals, a city street in the background

image
image

The second ZONE is automatically affected to the cat prompt. The prompts used as attention couple are : "photography of a dog, two animals, a city street in the background" and "photography of a cat, two animals, a city street in the background".

Another example:
image
image

To do :

  • Add a control layer mode "Attention"
  • Get the control layer image opacity as mask
  • Split main prompt by attention zones
  • Apply cond and masks to AttentionCouple
  • Use ETN_AttentionCouple node
  • Handle render methods
    • generate
    • refine
    • inpaint
    • refine_region
  • User switch for attention couple / region prompt finder
  • Load LoRA only once
  • Typechecks
  • Lint

@Danamir Danamir marked this pull request as draft April 21, 2024 00:48
@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

If you want to check my full ComfuUI Regional Prompting that inspired this PR : https://www.reddit.com/r/StableDiffusion/comments/1c7eaza/comfyui_easy_regional_prompting_workflow_3/

Note : The AttentionCouple node use a JS trick where you have to right click then use "add input", so a dumped workflow from krita-ai-diffusion is unusable until you connect the correct mask & cond.

@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

This development be of interest to : #386 #387 #567

[edit] : Mentioned the PR in those to get some potential testers. 😅

@Danamir
Copy link
Contributor Author

Danamir commented Apr 21, 2024

An example for #387 :

image
image
image

@illmarzini
Copy link

sorry if it s too nooby, but how do i implement this? thanks in advance.

@Danamir
Copy link
Contributor Author

Danamir commented Apr 22, 2024

Improved the coherency by subtracting the upper masks from each control layer mask. This allow to better respect the order of the control layers.

This only affect images where there is a covering between two control layers. If you put a control layer below another one covering it entirely, as intended it will have an empty mask and be ignored.

It could be an option, but from my experience it works better that way, otherwise smaller zones are frequently ignored in favor of bigger covering ones.

I tried manually computing masks intersections to combine prompts, but the difference in result is marginal, and the computation by nodes would be a nightmare.

@Danamir
Copy link
Contributor Author

Danamir commented May 29, 2024

I don't think passing a 0 mask makes sense unless you explicitly don't want to use the conditioning that is passed to KSampler for attention at all

That was only when testing my own workflow, where the masks are made programmatically, and sometimes there is a gap of one pixel between two masks. Having a 0.0 mask covering all the regions seems to fill the possible gaps and prevent breakage on some fringe cases. Nothing to do with the krita plugin, where all the masks are correct, I'll get rid of the 0 mask in the plugin and see if there is any problem.

but now that you mention it, it's probably easy to detect if the mask is full res and optionally do the downscale in the node

That would be great, but once again that was me thinking about my own workflow ; as it would allow me to get rid of the multiple-of-64 limitation. 😅

For the plugin it makes perfect sense to send an a small as possible mask directly.

Text prompts are passed to regions now. Conditioning passed to KSampler can be the same as the first region (previous behavior) or empty, or can be used to pass ControlNet etc.

Perfect.

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

The code is merged.

I left the masks at full size for now, removed the 0 str base mask, and got rid of the unnecessary extent scale to a multiple of 64.

@Acly
Copy link
Owner

Acly commented May 30, 2024

Thanks! Maybe this would be a good point to clean up this PR and merge it into regions branch.

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

Yes I think it's in a good state for this.

I'll take a look at lint & type checks, but I think there were some type checks errors that were unavoidable. I'll see.

[edit] : All done !

@Danamir Danamir marked this pull request as ready for review May 30, 2024 09:38
@BinaryQuantumSoul
Copy link

Can't wait to try this

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

There may be a bug when removing regions < 10% coverage. I did a test with many small regions. In those case I got this error and a black image :

F:\ComfyUI\custom_nodes\comfyui-tooling-nodes\nodes.py:69: RuntimeWarning: invalid value encountered in cast
  image = Image.fromarray(np.clip(array, 0, 255).astype(np.uint8))

The problem goes away if I put the coverage to a very low value in the code.

Here is the faulty workflow output : workflow-error.json
And a preview of the masks kept :
image

And the working one : workflow-ok.json
And its masks :
image

It seems that in the first case the leftover mask calculated subtracted a mask not used in the final workflow.

I don't know if the problem can be avoided at in ETN_AttentionCouple by patching it, or if the mask computation in process_regions needs to be absolutely correct.

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

Got it. The culprit is the accumulated_mask variable initialization :

    # Remove from each region mask any overlapping areas from regions above it.
    accumulated_mask = None
    for i in range(len(result_regions) - 1, -1, -1):
        region, job_region = result_regions[i]
        mask = region.mask
        if accumulated_mask is None:
            accumulated_mask = Image.copy(region.mask)
        else:
            mask = Image.mask_subtract(mask, accumulated_mask)
(...)

As it can be done on a region that will be ignored by the coverage check just below.

To be correct, accumulated_mask needs to be initialized only after the first valid region is found :

    # Remove from each region mask any overlapping areas from regions above it.
    accumulated_mask = None
    for i in range(len(result_regions) - 1, -1, -1):
        region, job_region = result_regions[i]
        mask = region.mask
        if accumulated_mask is not None:
            mask = Image.mask_subtract(mask, accumulated_mask)

        coverage = mask.average()
        if coverage > 0.9:
            # Single region covers (almost) entire image, don't use regional conditioning.
            print(f"Using single region {region.positive[:10]}: coverage is {coverage}")
            result.control += region.control
            return result, [job_region]
        elif coverage < 0.1:
            # Region has less than 10% coverage, remove it.
            print(f"Skipping region {region.positive[:10]}: coverage is {coverage}")
            result_regions.pop(i)
        else:
            # Initialize accumulated_mask if needed
            if accumulated_mask is None:
                accumulated_mask = Image.copy(region.mask)

            # Accumulate mask for next region, and store modified mask.
            accumulated_mask = Image.mask_add(accumulated_mask, region.mask)
            region.mask = mask

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

Still, for those little regions, I would rather use a 5% coverage check. Maybe if we had this as a setting value ?

@Acly
Copy link
Owner

Acly commented May 30, 2024

Ah nice find, it pays off to do some stress testing... I don't mind setting coverage check to 5% (but don't really want to put it in settings).

There is still some code I think we no longer need?

  • use_single_region - there's Generate/Refine Region which mostly covers that
  • attention_mask, mask_composite, load_image_mask... not sure which of those are still used
  • attention control layer
  • find_region_prompts

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

I was doing some tests to see if I could get rid of use_single_region. Since the actual region detection uses the mask + feather to do its coverage detection, with very close regions I have a prompt bleeding between two joined regions. The use_single_region was a trick to make sure that only the most prevalent region would be used.

Here is an example :

The regions :
image

The rendered image with an annoyed man and a smiling girl. You can already imagine the problem, because the man was rendered a little on the girl side :
image

When refining only the man face, he should still be annoyed (this is using the use_single_region trick) :
image

And by using the standard region detection, he is somewhat happy :
image

So I really like this option, as it offers more predictability when doing refining.

NB : find_region_prompts is then used to order the regions by mask coverage.

@Acly
Copy link
Owner

Acly commented May 30, 2024

IMO in this case it makes more sense to adjust the region mask to better match the result, then the 90% rule should take care of it. Alternatively select the region you want and switch to "Refine Region" (I don't know if it considers the selection at the moment, but that could make sense).

In any case I don't think there needs to be a bool passed so far down and branches in the workflow? If you want a single region, only a single region should be passed to the api/workflow.

@Danamir
Copy link
Contributor Author

Danamir commented May 30, 2024

in this case it makes more sense to adjust the region mask to better match the result, then the 90% rule should take care of it

This would be ideal, but I don't see how this can be done when sometimes the rendering of a region bleeds in another one. I tried with 0% growth/feathering/padding and "Selection Mask" as context, and it stills uses the two region instead of only one.

Read you response too fast, my bad. Of course painting the region manually should be an acceptable fix.

That being said, I perfectly understand if you want to get rid of this option. I can keep it in my custom fork only. 😅

@Acly
Copy link
Owner

Acly commented May 31, 2024

My main concern is that the functionality is in the wrong place, it should be much earlier with the rest of the region-selection/filtering process. If you don't want regions, just don't pass them to the workflow in the first place.

Also the model.region_only bool does very similar thing already, just need to tweak the logic in model.generate a bit to also work when there's a selection.

@daniel-richter
Copy link
Contributor

I don't know whether it's already important or related: I wanted to try out the PR, but only have CloudGPU available. Since cgem156-ComfyUI isn't installed in the "Stable Diffusion ComfyUI for Krita", I tried to add it as custom node. That didn't work - should it be possible to add this/a custom node to the ComfyUI instance of the CloudGPU image?

@Danamir
Copy link
Contributor Author

Danamir commented May 31, 2024

The last commits doesn't use the cgem156 node anymore, but the one adapted by Acly. You have to manually switch the branch of comfyui-tooling-nodes to preview to get the node.

@Acly Acly merged commit 15c7dbe into Acly:regions Jun 3, 2024
2 checks passed
Acly pushed a commit that referenced this pull request Jun 3, 2024
* Attention couple and mask manipulation nodes
* Apply attention control layer in generate
* Handle attention call in refine method
* Subtract lower masks from each attention control layer mask
* Use conditioning positive prompt
* Prevent pointer creation to first region mask
* Handle attention couple via the new Regions system
* Improved attention couple usage for hires fix
* Find region prompts according to current selection overlap
* Check if LoRA already loaded when extracting from prompts
* Correctly apply attention conditioning to hires pass
* Also apply attention couple to inpaint hires pass. Fix wrong image type error.
* Use empty background region by default
* Prevent result initialization to overwrite {prompt} token
* Merge region positive prompts with root positive prompt in the regions to_api
* No more prompt merging in apply_attention. Removed unnecessary negative prompts clip encoding
* Using ETN_AttentionCouple and ETN_ListAppend nodes to apply attention couple
* Initialize accumulated_mask with the first valid region mask
Acly pushed a commit that referenced this pull request Jun 4, 2024
* Attention couple and mask manipulation nodes
* Apply attention control layer in generate
* Handle attention call in refine method
* Subtract lower masks from each attention control layer mask
* Use conditioning positive prompt
* Prevent pointer creation to first region mask
* Handle attention couple via the new Regions system
* Improved attention couple usage for hires fix
* Find region prompts according to current selection overlap
* Check if LoRA already loaded when extracting from prompts
* Correctly apply attention conditioning to hires pass
* Also apply attention couple to inpaint hires pass. Fix wrong image type error.
* Use empty background region by default
* Prevent result initialization to overwrite {prompt} token
* Merge region positive prompts with root positive prompt in the regions to_api
* No more prompt merging in apply_attention. Removed unnecessary negative prompts clip encoding
* Using ETN_AttentionCouple and ETN_ListAppend nodes to apply attention couple
* Initialize accumulated_mask with the first valid region mask
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants