Ask HN: Any tool to generate AI images with exact same uploaded product images?

LarsDu88 · 2024-06-30T15:55:42 1719762942

Stable Diffusion Dreambooth.

The basic idea is you find a rare unused token (like say f$#sdafad) and then finetune your image generation model with a specific set of images (say 20 images of your red cap at various angles) while telling it that f$#sdafad is the same thing as your red cap.

Then you can start prompting "f$#sdafad resting on the head of a monkey" and your cap will appear on a monkey's head.

The problem with this technique is the finetuning part. Finetuning can take minutes to hours depending on how many gpus you have and needs to be done individually for every new "token" you want to map to a specific individual or object you want to add to your pre-trained model.

Another strategy is to use some kind of autocropping strategy + generative infill. You can take a semantic segmentation model like Meta's "Segment Anything", then use it to segment out the item of interest manually (perhaps a UI could be built to make this a one-step process). Then take the mask and do a generative infill using some sort of image generation model like stable diffusion.

lolinder · 2024-06-30T15:56:10 1719762970

Somewhat tangential, but this reminds me of these delightful product photos on Amazon:

https://www.amazon.com/dp/B0C4YP8MKY

The third photo is my favorite.

muzani · 2024-06-30T12:52:40 1719751960

Would Dream Studio do what you want? https://beta.dreamstudio.ai

sodiumtech · 2024-06-30T13:04:59 1719752699

Thanks but it doesnt work. It takes the reference from image of the product and creates similar images. What I am looking for is to use the same product image

Ologn · 2024-06-30T14:34:09 1719758049

The specification of exactly what you want probably needs to be refined, but for that it's probably helpful on your end to know what is available and what is needed.

There are different methodologies for taking Stable Diffusion and adding some new concept (like a product with some images) to it. Hypernetworks, textual inversion and dreambooth are three methods.

I was trying to put the concept of specific people that I had a number of good photos of into Stable Diffusion. They say for images of people that dreambooth is better than hypernetworks and textual inversion, and it was. Hypernetworks were nowhere near as good for pictures of people, but I got a good picture once in a while. I never got anything good from textual inversion for pictures of people.

A more recent technique that is being used to put new concepts into Stable Diffusion is LoRa. I'm not that familiar with it but it being used and is another alternative method.

I have an Nvidia RTX 3060 card and spent hours (sometimes a day) making each model, and usually I had dozens of decent pictures (sometimes over 100) of each subject. It came out well. I needed a number of high quality photos from different angles of each subject. Not just from different angles but some full body shots, mid body shots and then some shots of just the face. Also if they were wearing glasses in 95% of the photos, most of the output would have them wearing glasses. Also if most of the photos I had were of them in group photos and I could barely crop their face out from the faces around them in the group shot, it was harder to do.

throwup238 · 2024-06-30T15:33:51 1719761631

If you're willing to put in a little work it's possible, but you're going to have to get your product rendered or photographed at the angle you want it to appear in the photo, then mask the product while having the AI generate the rest of the image.

I've done this with some success using t-shirt mockup templates to get the color, shadows, folds and creases in the clothing right then regenerated everything around the shirt.

65 · 2024-06-30T15:39:11 1719761951

Photoshop generative fill, perhaps?

Or... just Photoshop a person wearing the hat.

Or... just take a picture of someone wearing the hat.

rco8786 · 2024-06-30T16:07:36 1719763656

> Or... just take a picture of someone wearing the hat.

That’s the hard part right? Because you don’t want just any person. You want a very specific person, at a very specific location, with very specific lighting and angles, etc.

lucb1e · 2024-06-30T15:50:20 1719762620

> lets say we have a cap and I want to generate a human image wearing that same cap

Do you have an image of the cap in the right orientation, as it would also appear as it is sitting on someone's head?

If not, any algorithm is necessarily going to have to invent what the cap looks like from another angle, making up details on any previously-hidden side and guessing at the depth of different parts of the still image in order to rotate it into the right orientation

If yes, crop it out and paste it onto the target head

Gooblebrai · 2024-06-30T14:11:20 1719756680

You are looking for platforms or implementations using Dreambooth.

swatcoder · 2024-06-30T15:53:52 1719762832

Why do you specifically want to use AI or even care how its done?

If you have to do this for only a few things, you can do it for a pittance through a service like Fiverr.

If you have higher or recurring volume, you can employ an offshore company to do this for a very modest monthly retainer.

Just let the provider on the other end figure out the tool that makes it cost effective for them -- AI or otherwise.

jaggs · 2024-06-30T17:36:04 1719768964

Stable Diffusion with a suitably trained LoRA? The trick is in finding a tech which gives rapis LoRA fine tuning. I'm sure they exist, but I can't come up with one right now.

Once you get that post right, using something like Krita with AI Diffusion should give you a nice fast process flow.

Zappletree · 2024-07-02T15:29:43 1719934183

Stable diffusion with IPadapter. ComfyUI guide: https://youtu.be/C94pTaKoLbU?si=ODDJRDm3W5enXoHK

unleaded · 2024-06-30T15:15:57 1719760557

A camera?

mg · 2024-06-30T15:15:58 1719760558

I think the description of what you want to achieve is not clear.

Can you link to an example image which you would upload and then describe based on that example what the generated image should look like?

oezi · 2024-06-30T15:17:27 1719760647

What isn't clear about the question?

sodiumtech · 2024-06-30T15:20:22 1719760822

sure. For example check this cap image https://pngtree.com/freepng/deep-blue-peaked-cap_6103980.htm...

Now If i want to generate an image of a boy wearing this same cap how can it be done.

mg · 2024-06-30T15:22:50 1719760970

In that image, a little bit of the backside of the cap is visible. The dark border at the right bottom.

So as I understand it, the AI would have to figure that out on its own and remove it before it adds the boy to the image?

Also, that image has watermarks all over it. Does that mean the AI has to detect and remove those?

The perspective of the cap is rather unusual for a photo on a human. As it would cover the eyes. Does that mean the eyes of the boy would be covered in the result? Or do you expect the AI to change the perspective of the product photo?

codetrotter · 2024-06-30T15:29:50 1719761390

> Also, that image has watermarks all over it. Does that mean the AI has to detect and remove those?

I don’t think that’s what they meant. I think they mean they will use photos of products that they themselves (or the factories they buy it from) provide, without watermarks, and they want to add generated things like people etc into the photo while keeping the product itself the exact same as it was

sodiumtech · 2024-06-30T19:52:54 1719777174

yeah exactly. That image was just as an example.

Curzel · 2024-06-30T15:32:59 1719761579

Nothing free that I know of, get Adobe Firefly

joenot443 · 2024-06-30T15:41:23 1719762083

AFAIK this specific tech doesn't exist yet.

LarsDu88 · 2024-06-30T15:49:06 1719762546

Not true. Stable Diffusion dream booth