Skip to content

[Feature Request] Cogvlm v7 and internlm-xcomposer2-vl-7b #2

@311-code

Description

@311-code

I have used cogvlm v7 (through a guy on patreon) to caption 120,000 images and itndies a very good job, it does it even on some uncensored photos (in detail) if I use a special prompt with English and Chinese characters. I also found internlm-xcomposer2-vl-7b to be pretty good but seemed a bit more censored.

Would love to see these two models added.l as options to this repo. Also not sure if this is just sci-fi yet. But an ability fir a clip vision to follow directions such as in certain scenarios if the image will not crop to 1024x1024 without cutting out the subject to put black bars on the left and right side (poor man's masking) and always worked well for me as long as most if the dataset wasn't like this.

In scenarios when it's a full body shot and the subject is next to someone to "crop the image so that the entire person is in view and any other person is not, move then to the side of the image"

Maybe you could have a ling context for certain scenarios and have the ai/clipvision follow your preset rules for cropping.

Ps. Sd3 dataset was captioned using 50 percent cogvlm. I had no issues running on my 4090 but I forgot how many parameters this one was. I believe it may have been thudm--cogvlm-chat-hf

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions