On Creating an On-Device Stable Diffusion App, & Deciding Not to Release It: Adventures in AI Ethics
The app was approved in the App Store. It had gotten through app review. I had more than 900 beta testers in the public beta. So many retweets and likes of my snippets of my work throughout the development process. This was going to be my biggest side-project app, and Iāve already made apps on the side that generated enough for a 20-hour-a-week job, which I then sold off for work-life balance. This one though, I had the dollar signs in my eyes.
The reason it was interesting was because I figured out how to do a powerful image generation technique - previously only accessible via a cloud server or a beefy laptop - all on a phone. This opened up the chance for me to drastically out-price competitors relying on expensive cloud servers and have much more control over the experience.
And I had accomplished amazing technical marvels. My first image took an hour on a 4-year-old iPhone. I later got it does to 15 minutes on that phone. And down to 2 minutes on a new phone! (Sidenote, this was because I figured out how to get the model running on the Apple Neural Engine, a thing I previously thought was all marketing speak for internals that only matter to our internal camera team and what not. Let me tell you: getting that thing running on there was just doubled my speed and memory efficiency. Here is one helpful link.)
Then at another personās behest, I started looking at ways I could support the artists whose work had been used to train the model originally, since I wasnāt the original trainer or purveyor of the dataset. I went down that path, and Iāve come out the other side saying, no, this version of my app, which uses Stable DIffusion as trained on the LAION-5B dataset (the 5B meaning more than 5 billion and scraped from the Internet and not curated), should not be released. Dang it.
Part One: License
I did do a bit of research at the outset as to whether it was legal to use the stable diffusion in an app. I looked at the license and determined, yeah, thatās pretty dang broad and the main portion is that you have the user agree to terms of use, which amounts to: donāt do bad stuff with this.
Besides, I had already seen a dozen projects on Twitter showing off this and that use of Stable Diffusion, and not to mention that the company involved in its creation, Stability.ai, has its for-profit version called DreamStudio active.
So while all the messaging on the sites says āfor research only,ā it all has a wink-wink vibe in the way that everyone is using it, and in the permissiveness of the license, and in the companyās own practice.
There friends, is where I fell for the classic, well-everyone-else-is-doing-it bit. And you know what an over-fit machine learning model will tell you to do if itās trained on everyone else doing it? Jump in.
Part Two: Safety
Look, you can generate terrible, terrible images with these things if youāre set on it. The original model has an NSFW filter at the tail end. However, for the coder, itās trivial to disable.
I decided to put my checker up front, checking the text, do the best I could to account for the Scunthorpe problem and leave it at that. I even pat myself on the back because at least in my version, there isnāt a feasible way of disabling the security check without breaking your phone. And I was prepared to enable an additional NSFW filter at the end.
Could someone still find the magic word combo to generate awful images? Undoubtedly. The tradeoff of letting the fast majority of well meaning people with an iPhone or iPad or Mac generate images for their business or for fun seemed to outweigh the potential harm. So, I considered myself good there.
Part Three: Robots Taking Jobs
One thing that youāve got to consider in ML is, are you going to automate people away? On the level of society, I think weāll always have work that people want done. Every time I make software that does a job 10 times better someone says, āthatās great, now could it also do 20 other things?ā Even if we got to the point where AI generated 100 percent of what we need, we would probably demand that the quality of what we have increase by 1,000 percent, and/or we would consume 10,000 times as much. And folks would find work making that happen.
The bigger concern is in the micro picture. Sure technology can eventually create more jobs, but you have real people who have to sacrifice time and energy retraining for those jobs. The shift isnāt without cost.
Thatās big picture, but in the end, that cost is worthwhile to everyone involved, in my estimation. For image generation, the generator is more of a straightforward tool. Already, existing professional artists are the best at generating prompts to make the best art. Probably because their work involves so much communication with clients already about what they want. And there will be back and forth. A person will say, thatās not quite right, the characterās arm should be a little lower, and their brow should be more furrowed. And even if you could circle part of the image and dictate what you want, at that point it gets closer to Photoshop.
In the end, you need people at the helm of the tools, and folks making a living in visual arts will adapt, and probably adapt more quickly than a lot of transitions.
Part Four: Exploitation
This is the part I couldnāt get past.
I couldnāt exactly say, hey, you wrote āa roseā as your prompt so I give every copyrighted artist with āroseā in the prompt a 2 billionth of a cent. Mostly because thatās not how these ML systems work. You mightāve had an image of a āplantā or the color āredā for a red rose contribute but there really isnāt a way to clearly know.
For a bit I considered, hey, what if I set aside proceeds from my app to give to an artist advocacy organization, and give āroyaltiesā voluntarily and as best I could that way?
Thatās a nice gesture, but in the end, if everything went perfectly, I would be making bank on artists who in no way gave their consent to be used in a machine learning project. They were guarding their art and it was ripped from them. And I would be profiting. I could say, well itās those Stability.ai folks who really should pay. If I release the app as is though, Iām part of the chain of responsibility.
Itās also kind of egregious. I swear that half the prompts I see have Greg Rutkowskiās name in there, and he had no say in his art getting used.
Itās murky enough when we worry that a personās voice is stolen for a text-to-speech situation or an AI assistant. This is hundreds of times worse.
So thatās where I hang up my hat. Iām more amenable to folks using it for parody and play. If it stayed there it would be fine. But it canāt. The system is way too cool and useful. A lot of folks are going to make a lot of money. Just, for now, not me.
Where to now?
Youāve probably noticed that Iāve said things like āfor nowā a lot. If Stability.ai trains on a checkpoint free of copyrighted material, I am down to use that. Or at least a checkpoint where artists have been given the chance to opt out.
(Heaven knows Iāve toyed with the idea of training a model myself. Might take a little bit, but I know the precise steps.)
In the meantime, Iāll let the code get a little dusty. Close up the beta. And sigh at all the cool, for-profit demos people will post of Stable Diffusion derivatives.
Yes, the genie is out of the bottle. Doesnāt mean I have to make wishes.