Derek Sivers
You Look Like a Thing and I Love You - by Janelle Shane

You Look Like a Thing and I Love You - by Janelle Shane

ISBN: 0316525227
Date read: 2023-06-18
How strongly I recommend it: 9/10
(See my list of 360+ books, for more.)

Go to the Amazon page for details and reviews.

A funny book explaining the basics of AI! The subtitle is How Artificial Intelligence Works and Why It's Making the World a Weirder Place. A great introduction to AI. With a cute cartoon mascot. The title is from her training an AI to write romantic greeting cards.

my notes

My blog, AI Weirdness:

Pranking an AI - giving it a task and watching it flail - is a great way to learn about it.

The inner workings of AI algorithms are often so strange and tangled that looking at an AI’s output can be one of the only tools we have for discovering what it understood and what it got terribly wrong.

A machine learning algorithm figures out the rules for itself via trial and error, gauging its success on goals the programmer has specified.
It can discover rules and correlations that the programmer didn’t even know existed.

Many AIs learn by copying humans. The question they’re answering is not “What is the best solution?” but “What would the humans have done?”

Worrying about an AI takeover is like worrying about overcrowding on Mars.

AI models get reused a lot, a process called transfer learning.
Using less data by starting with an AI that’s already partway to its goal.
Start with an algorithm that’s already been trained to recognize general sorts of objects in generic images, then use that algorithm as a starting point for specialized object recognition.

Adding hidden layers to our neural network gets us a more sophisticated algorithm. To combine the insights from the previous layer.
This approach - lots of hidden layers for lots of complexity - is known as deep learning.

Like a very specialized punisher cell, designed specifically to punish: called the activation function.

The point of using machine learning is that we don’t have to set up the neural network by hand. Instead, it should be able to configure itself into something that does a great job.

Class imbalance: only a handful of every thousand sandwiches from the sandwich hole are delicious.
The neural net may realize it can achieve 99.9 percent accuracy by rating each sandwich as terrible, no matter what.
To combat class imbalance, we’ll need to prefilter our training sandwiches so that there are approximately equal proportions of sandwiches that are delicious and awful.
Class imbalance–related problems show up when we ask AI to detect a rare event.
Medical imaging, where they may be looking for just one abnormal cell among hundreds.

Almost all the cells in a neural net are as mysterious as this one.

To make a DeepDream image, you start with a neural network that has been trained to recognize something - dogs, for example.
Then you choose one of its cells and gradually change the image to make that cell increasingly more excited about it.

A Markov chain is an algorithm that can tackle many of the same problems as the recurrent neural network (RNN).
You’ve probably interacted with directly if you’ve used the predictive-text feature of a smartphone.
Markov chains are more lightweight than most neural networks and quicker to train. That’s why the predictive-text function of smartphones is usually a Markov chain rather than an RNN.
However, a Markov chain gets exponentially more unwieldy as its memory increases.
Most predictive-text Markov chains, for example, have memories that are only three to five words long.
RNNs, by contrast, can have memories that are hundreds of words long.
Because training a new Markov chain is relatively quick and easy, the text you get is specific to you.
Your phone’s predictive text and autocorrect Markov chains update themselves as you type, training themselves on what you write.

A random forest algorithm is a type of machine learning algorithm frequently used for prediction and classification.
To understand the forest, let’s start with the trees.
A random forest algorithm is made of individual units called decision trees.
A decision tree is basically a flowchart that leads to an outcome based on the information we have.
The decision tree can become so deep that it would only work for the specific situations from the training set. That is, it would overfit the training data.

The random forest method of machine learning:
In much the same way as a neural network uses trial and error to configure the connections between its cells, a random forest algorithm uses trial and error to configure itself.
A random forest is made of a bunch of tiny (that is, shallow) trees that each consider a tiny bit of information to make a couple of small decisions.
During the training process, each shallow tree learns which information to pay attention to and what the outcome should be.
Each tiny tree’s decision probably won’t be very good, because it’s based on very limited information.
But if all the tiny trees in the forest pool their decisions and vote on the final outcome, they will be much more accurate than any individual tree.
(The same phenomenon holds true for human voters: if people try to guess how many marbles are in a jar, individually their guesses may be way off, but on average their guesses will likely be very close to the real answer.)

The simplest methods of trial and error are those in which you always travel in the direction of improvement - often called hill climbing if you’re trying to maximize a number, or gradient descent if you’re trying to minimize a number.
Your search space - somewhere in that space is your goal.

In evolutionary algorithms, each potential solution is like an organism.
In each generation, the most successful solutions survive to reproduce, mutating or mating with other solutions to produce different - and, one hopes, better - children.

Image-generating, image-remixing, and image-filtering tools are usually the work of GANs (generative adversarial networks).
They’re a subvariety of neural networks.
GANs is they’re really two algorithms in one - two adversaries that learn by testing each other.
One, the generator, tries to imitate the input dataset.
The other, the discriminator, tries to tell the difference between the generator’s imitation and the real thing.
GANs work by combining two algorithms - one that generates images and one that classifies images - to reach a goal.

The narrower the task, the smarter the AI seems.

I trained a neural net to generate new titles for BuzzFeed list articles:
"17 Things You Aren’t Perfectly And Beautiful"

Text-generating RNNs create non sequiturs because their world essentially is a non sequitur.

AIs lack the contextual knowledge to understand when their solutions are not what humans would have preferred.

It’s really tricky to come up with a goal that the AI isn’t going to accidentally misinterpret.

Many kinds of crime and fraud could be thought of as reward function hacking.

A way to get machine learning algorithms to solve problems without ever being told the goal at all.
Rather, you give them a single, very broad goal: satisfy curiosity.
A curiosity-driven AI makes observations about the world, then makes predictions about the future.
If the thing that happens next is not what it predicted, it counts that as a reward.
As it learns to predict better, it has to seek out new situations in which it doesn’t yet know how to predict the outcome.

The noisy TV problem: the AI was chaos-seeking rather than truly curious.
It would be just as mesmerized by random static as by movies.
So one way of combating the noisy TV problem is to reward the AI not just for being surprised but also for actually learning something.

If data comes from humans, it will likely have bias in it.
Since humans tend to be biased, the algorithms that learn from them will also tend to be biased unless humans take extra care to find and remove the bias.

If we try to teach a narrow AI a second task, it’ll forget the first one.
This quirk of neural networks is known as catastrophic forgetting.

Doom-playing AI that was really three AIs in one - one observing the world, one predicting what will happen next, and one deciding the best action to take.

Algorithms tend to become more biased than their training data.
From their perspective, they have only discovered a useful shortcut rule that helps them match the humans in their training data more often.

Pictures showed a man cooking only 33 percent of the time.
The AI labeled only 16 percent of the images as “man.”
It had decided that it could increase its accuracy by assuming that any human in the kitchen was a woman.

That tiny adversarial patch of static managed to convince the AI that a submarine was in fact a bonnet - and that a daisy, a brown bear, and a minivan were all tree frogs.

Voice-to-text algorithms can also be hacked.
Make an audio clip of a voice saying “Seal the doors before the cockroaches get in.”
Overlay noise that a human will hear as subtle static but that will make a voice-recognition AI hear the clip as “Please enjoy a delicious sandwich.”
It’s possible to hide messages in music or even in silence.

Slipping the words ‘Oxford’ or ‘Cambridge’ into a CV in invisible white text, can pass the automated screening.

If you chop an image of a flamingo into pieces and rearrange the pieces, a human will no longer be able to tell that it’s a flamingo. But an AI may still have no trouble seeing the bird.
The AI is acting like a bag-of-features model.
Only looking for the features, not how they’re connected.

Machine learning needs humans for is maintenance.
After an AI has been trained on real-world data, the world might change.
The AI had been trained on cars from the 1980s and didn’t know how to recognize modern cars.

If an algorithm sees that there are more arrests in a particular neighborhood than there are in others, it will predict that there will be more arrests there in the future, too.
If the police respond to this prediction by sending more officers to the area, it may become a self-fulfilling prophecy: more police on the streets means that even if the actual crime rate is no higher than it is in other neighborhoods, the police will witness more crimes and make more arrests.
Then the problem will only escalate.
Even humans fall for this as well.

Book on Amazon priced at $1,730,045.91 and $2,198,177.95.
The next day, both books had increased in price, to nearly $2.8 million.
The company that sold the less expensive book would increase its price so that it was exactly 0.9983 times the price of the more expensive book.
The expensive book’s price would increase to become exactly 1.270589 times the price of the cheaper book.
Both companies were apparently using algorithms to set their book prices.
One company wanted to charge as much as it could while still having the cheapest book available.
The company that sold the more expensive book had very good feedback scores and theorized that it was counting on this to induce some customers to pay a slightly higher price for the book - at which point it would order the book from the cheaper company and ship it to the customer, pocketing the profit.

Musicians can employ music-generating algorithms, using them to put together a piece of original music to exactly fit a commercial slot for which the music doesn’t have to be exceptional, just inexpensive.

The human role is to be an editor.