Knitters will often look for patterns that work with the yarn they already have, rather than purchasing new yarn for each pattern.
That means that their choice of patterns is often constrained by the kind of yarn required to make the pattern.
Leaving them out of my algorithm means that the recommendations it generates are often superficially acceptable, but don’t reflect knitters’ actual preferences at all.
To fix these issues, first I had to go back to the source of the data and, cursing myself, download it all again, including the extra metrics I now realised were vital to the analysis.
Then, in consultation with my hastily-recruited knitting experts, I created weightings — a value to scale each metric by to reflect its importance to knitters in choosing patterns.
With these improvements, I could make recommendations that started to make sense to actual knitters.
The final algorithm chiefly considered the craft of the pattern (whether it was knitting, crochet, or something else), the category, the yarn weight, and the difficulty.
But it also took into account keywords, the needle gauge, and a host of other factors.
For each of these, it would calculate a “distance” between patterns.
The more factors they had in common, the smaller the distance.
The “Spring Collection”For Mr.
Dangly, the knitted soft-toy monkey from above, this algorithm recommended the “Spring Collection”, a set of soft toy patterns including a hedgehog, a frog, a bunny, and a lamb.
They were similarly easy to knit, employed similar techniques — a seamed construction, and use of fringing (for Mr.
Dangly a dapper tuft of hair, and for the Spring Collection a hedgehog’s spines).
In many respects, this is a very good recommendation.
But there was something not quite right.
Dangly has a certain eccentricity about him, a degree of whimsy or style that makes him stand out.
By contrast, while superficially similar to Mr Dangly, the Spring Collection are missing his unique charm.
They seem a little bland, and slightly twee.
I feel like friends of Mr.
Dangly are unlikely to get on with the members of the Spring Collection.
The algorithm has made a suggestion which is correct on paper, but which misses some of the nuance of the patterns.
It misses something that no human could fail to see.
It misses their personality.
That left the collaborative algorithm to try to do better.
Collaborative filtering algorithms are very different to their content-based cousins.
Where our content-based algorithm uses metadata about the patterns themselves, a collaborative filtering approach looks at information about the people who interacted with those patterns.
Ravelry allows users to “favourite” patterns, and store a list of their favourites.
These lists are public, which meant I was able to download lists of the “favourite” patterns of several hundred thousand Ravelry users.
Individually, these lists are idiosyncratic and not very informative.
Someone who likes a chunky knitted scarf might also have an interest in novelty tea cosies, and this shouldn’t imply anything to us about either scarves or chintzy home decor.
But in aggregate, in the kinds of volumes I was able to retrieve from Ravelry, this information can become very useful indeed.
Because collaborative filtering uses the comparatively more complex and subtle information about human behaviours and preferences, my hope was that I’d be able to more accurately capture something about the elusive Mr.
In many ways this algorithm was much more simple to construct, although the volume of data was much larger.
Where the content-based algorithm used a dataset in which each column represented some aspect of the pattern, for the collaborative algorithm, I built a dataset in which each column represents a user, and the values for each row reflected whether that user had “favourited” that pattern:With this dataset, we can calculate similarity in exactly the same way as we did with the content-based algorithm.
Each user is a “dimension” in the euclidean distance calculation.
The patterns that are liked by the same users are considered similar.
The first results of this algorithm were, to my eye, not promising.
A tiny knitted purse was, according to the algorithm, very similar to a lace-edged washcloth.
A pair of cabled socks was similar to an intricate shawl.
It made no sense, until I spoke to my knitting experts, who pointed out more subtle similarities that my untrained eye had missed.
The purse and the washcloth were both small projects that used a variety of techniques, patterns a new knitter might use to develop their skills.
The socks and the shawl both used two colours of fine yarn, which might be preferred by someone looking to use up their existing stock of wool.
But other recommendations were baffling even to my expert knitters.
Patterns that had very few interactions — new patterns or obscure ones — tended to get idiosyncratic or plain wrong recommendations, the product of the algorithm having very little information on which to base its similarity calculation.
This is a weakness of the collaborative algorithm — the “cold start problem” — rearing its head.
There are some mathematical tricks you can use to get around this problem, and I ended up using a lot of these (I won’t go into them here in any detail), but the algorithm continued to give poor recommendations for any item without a good number of user interactions.
The SocktopusBut how did it fare with our friend Mr.
Dangly?.Fortunately, Mr Dangly’s peculiar charms had garnered him sufficient attention to create a good set of recommendations.
And the leading recommendation was an ideal illustration of the strengths of a collaborative recommendation.
“Socktopus” is, as the name implies, an octopus who wears socks.
He’s easy to make, and uses some similar techniques to Mr.
Dangly, but is also missing some of Mr Dangly’s features — there’s no seam, and no fringing.
But crucially, the Socktopus has something of the same air of whimsy, the same sense of personality, as Mr.
The collaborative algorithm has found something that the content-based algorithm hasn’t — not just the raw information, but how people relate to that information.
It understands something about their meaning.
For this reason, collaborative algorithms are hugely popular with anyone trying to recommend complex cultural products, whether that is books, movies, or music, or even knitting patterns.
But when online platforms make extensive use of collaborative recommendations to guide what content they show to users, they risk falling victim to another pernicious weakness of this kind of algorithm.
Collaborative algorithms are very good at finding subtle connections between items, at defining the boundaries of taste.
They can distinguish between sub-genres of music, so that Death Metal fans are never offended by hearing Doom Metal (which, I am assured, is a totally different thing).
They can finely parse topics of news articles, so that someone who follows their local sports team is not bothered with local politics.
But this policing of boundaries comes at a cost.
They can create a “bubble” effect.
A bubble effect is when users are only shown a small slice of a much larger pool of content, and never become aware of anything outside their tiny portion.
When an algorithm is driven by what users interact with, and users only see what the algorithm shows them, there is a feedback loop.
Users interact with what they are shown, and they are shown what they (or users like them) interact with.
Without care, users can find themselves wrapped in a cocoon of “safe” content, only ever seeing the same sorts of things.
This is a risk for online platforms commercially.
If users never discover new things they might like, and if they never have an opportunity to expand their tastes, they get bored and move on.
But this also has a more subtle social risk.
In a feedback loop, people are only shown content that suits their taste, that they agree with, content that fits their existing worldview.
It is unpleasant to encounter things which challenge what you believe about the world; to hear music you don’t like, watch films about things you don’t understand, or read news articles about terrible events.
But encountering and attempting to understand new and challenging things is how we learn to accommodate and cooperate with people different from ourselves.
By walling ourselves away with like-minded people, seeing only things we like, we wall ourselves off from the wider world.
It’s easy to make assumptions about something when you don’t know a lot about it.
I underestimated the complexity and nuance of knitting and its enthusiasts.
I’d never really encountered it before, and it was very uncomfortable to have that ignorance exposed.
Both collaborative and content-based recommendation algorithms can only extrapolate patterns from what they’ve seen in the past.
They, like me, are apt to make assumptions based on what they already know.
This is not a problem with the algorithms themselves — the maths is unimpeachable.
The problem is with the people building the algorithms, and with the people using them.
My downfall with the content-based algorithm was assuming that a lot of data and some clever maths were enough to understand a complex field.
The risks with the collaborative algorithm are more subtle, but they’re also born of a kind of arrogance, a belief that we can completely anticipate a users’ needs, that they are as simple as finding something like what they already enjoy.
In both cases, embracing complexity, and adopting a more holistic approach, can help mitigate these risks.
This is a theme we’ve seen with the previous algorithms, and we’ll see again in future essays.
It’s a common feature of machine learning algorithms.
They are only as clever as the data that was used to create them.
If they are to truly learn, truly create something new, rather than just reflect our existing knowledge and biases, then it is us that must teach them.
It is us that must confront our ignorance.
It is us that must learn.
The previous article in this series, “Linear Regression and Lines of Succession” is available here.
Code for this article can be found in my github, here.
The next article will be published in April.
To make your own Mr.
Dangly or Socktopus, go to www.
com (account required).