Philipp Schmitt, Computed Curation

JTF (just the facts): Published in 2018 by Bromide Books (here). Cardboard covers, leporello binding, 120 pages, with 115 color reproductions, housed in a cardboard slip box. Includes a tag index and a short text on the back of the slip box. In an edition of 300 copies. Design by Philipp Schmitt and Margo Fabre. (Cover and spread shots below.)

Comments/Context: In the past decade or two, as photographs have increasingly become a universal language of communication the world over, the idea that we need to teach people how to be visually literate (i.e. how to “read”, analyze, and think critically about photographs) has gone from being an esoteric academic thought to a more pressing reality. Given the flood of imagery we are now exposed to every minute, hour, and day, not only do we need to be able to simply recognize what a photograph is showing us, we must also make judgements and draw conclusions about what it represents, what it omits, the point of view of the photographer, the “truthfulness” of the depiction, and many other details and nuances that our busy, distracted brains must process in an instant. In a sense, we need to train ourselves to see with much more robustness, not only so we are not fooled by fakes and trickery, but so we can reliably discern the central ideas the photographs we encounter are trying to communicate.

In parallel with our efforts to help humans consume imagery with more intention, computer scientists are trying to teach computers to do the same thing. But even with layers of complex software algorithms, high speed image processors, and sophisticated trainable AI systems, this challenge remains monumental, at least at the level of subtlety we take for granted. Even the most simple tasks of image recognition and categorization turn out to be extremely complex when translated into the abstract rules, measurements, pattern matching motifs, and lines of code of machine learning and computer vision, and when we then attempt to synthesize all of that image-based data and information into something simulating interpretation, understanding, or insight, those leaps get exponentially harder.

Philipp Schmitt’s photobook Computed Curation is in a sense an aggregate portrait of how computers see, at least at this particular technological moment in time. It isn’t so much a gotcha catalog of the mistakes and errors that computers make when trying to comprehend photographs, but an opportunity to try to place ourselves in the shoes of the machines for a moment, to follow the logic of how these machines “see” and “think”. And what emerges is a very foreign kind of image recognition intelligence, one that is “right” in many cases (to us, that is) but also utterly “wrong” in many obvious ways, the computer often making connections and conceptual leaps based on its own hidden criteria that feel head-shakingly and mystifyingly strange.

Each page in Computed Curation is laid out in the same manner. A color photograph is placed in the center of the white space, with a date and location stamp nearby in small letters. On the top of the page, a text caption explaining the picture is provided, along with a percentage confidence rating from 0% to 100% (representing how confident the computer is of its suggestion), calculated out to a staggering thirteen decimal places. At the bottom, a list of text-based tags for the photograph is offered. All of the information attached to each image has been gathered directly from various software systems (Google Cloud Vision API, Microsoft Cognitive Services API, Adobe Lightroom, and other object detection algorithms).

Schmitt wasn’t particularly generous when he selected the images for the software systems to analyse; there are no straightforward shots of dogs or footballs or cars set against blank backgrounds in clear light, making it easy for the computers to do simple pattern matching or assessment. Instead, he has chosen mixed images of various kinds, where landscapes and urban scenes are dotted with people doing things and objects left at random, creating compositions that are layered and complicated. And not surprisingly, the computers struggle with the task of identification.

What’s fascinating about the details of this exercise is not so much that the computers have a hard time – that we might have expected – but it is how they fail. One picture labeled with better than 67% confidence is entitled “a crowd of people watching a large umbrella”. It is indeed an image of a large crowd waiting for a performance of some kind to begin; the crowd doesn’t appear entirely focused on the central figure of a man who is apparently about to get started. Off to the side is a large pink umbrella, and while it is a bold element of the composition, no human would ever think that the crowd was primarily looking at the umbrella. In a sense, the computer is literally correct, but the essence of what is actually happening is lost. The tags for the image are given as “crowd, people, spring, festival, tradition”. Here again, the computer is right on the first two, close on the third (the data stamp is August, but maybe the short sleeves of the clothes imply springtime weather), and extrapolating to the last two, making the connection that the gathering of people in such an arrangement might be associated with a larger context. Maybe the man is going to make a political speech, do some magic tricks, sing some local folk songs, or introduce someone else – we can’t possibly tell, but the computer tries to figure it out, using what it has seen before.

On the other side of the spread, an image depicts some tarps that are strung up on metal scaffolding, with a garbage can in front and a roller coaster in the background, the ground of the setting appearing to be a boardwalk of some kind. The caption reads “a yellow boat sitting on top of a bridge”, with 18% confidence, and the tags are “vehicle, ship, sea, sailing ship, mast, watercraft, tall ship, walkway, dock, pier”. These conclusions or categorizations seem utterly crazy until you squint your eyes a bit and see that the arrangement and shape of the two tarps resembles that of the sail and jib of a sailboat, with the metal poles of scaffolding providing the “masts”. Again, no human would make this mistake, but we can absolutely see how the computer was fooled, and how it “sees” the image as a set of spatial relationships that can be matched to other pictures it its archives. Maybe it has never been shown a photograph of a roller coaster, and so assumed it was a bridge, which might seem even plausible. It did identify the foreground planks correctly, concluding they were part of a walkway of some kind. The whole story doesn’t quite coalesce, but we can see how the computer was trying to make sense of what it saw as conflicting information.

This kind of oddball explication happens with nearly every image in the book. “A man riding a skateboard up the side of a ramp” depicts a construction site, with a dumpster and a tube to drop the debris from the second floor – there is actually no man in the picture at all. “A man standing on top of a lush green field” is an image of a golf course green with a red pin placement flag – again no man. “A group of colorful flowers in it” captures stacks of plastic chairs and beach chaises. And “a bird that is flying in the sky” does indeed capture a seagull against fluffy blue sky clouds, it just omits that fact that the image also includes the Statue of Liberty in the distance. Paging through this book is like speaking to a foreigner, or an alien, or perhaps a small child – someone whose frame of reference is so different from our own that the conclusions they draw from the evidence provided are so unexpected that it is hard not to smile in perplexed wonder. As a whole, the book is at times weirdly comic and surreal, often puzzling and confusing, and always brainy and thought-provoking.

Schmitt has also used algorithms to do the curation, feeding all the images into the system and allowing it to deduce the best path through the photographs, based on similarities of color, subject matter, composition, and other factors. The sequencing that emerges is actually logical in its own way, and if we look closely, we can guess at how the computer made some of its choices. In one series, large expanses of foreground dominate a snow covered field and then a gravel courtyard, followed by more snow as the setting for a bike ride and then a wider snow covered mountain vista, and then wide water in front of a city and then ocean waves – with just a little attention, we can watch as the computer puts things next to each other in a kind of order. Some of the other connections are pleasingly inspired – the curve of a red hose on a sidewalk followed by a dark blur with a similar shape, a flattened view of horizontal striping on a wall next to a flattened view of a car in front of a construction site, the deep green of fake grass placed next to the deep green of the ocean, or the truly wonderfully eclectic pairing of a crowd of tourists and then a crowd of water lilies in a pond.

This careful sequencing exercise takes place on an endless leporello fold that wanders rhythmically back and forth with incremental page flips (it is printed on both sides) and billows out when allowed to break its rigid stepping forward. The design and construction choices of the photobook here match the content well, the fonts and graphic elements made minimal to reflect the precision of the computer processing.

As contemporary photography becomes more and more computational, the interactions between human and machine thinking will likely become more common. Computed Curation shows us just how far apart these worldviews are at the moment, almost as if we are speaking entirely different languages and using a less than perfect translator as an intermediary. But just as we need to make extra effort to understand the perspectives of people from foreign lands, we also need to increasingly recalibrate our own heads to be more aware of how computers are thinking (and changing). Schmitt’s smart photobook sparks all kinds of questions about how images can and should be interpreted, making us look at even the most mundane of snapshots with a curiously fresh set of eyes.

Collector’s POV: Philipp Schmitt does not appear to have consistent gallery representation at this time. Interested collectors should likely follow up directly with the artist via his website (linked in the sidebar).

Send this article to a friend

Read more about: Philipp Schmitt, Bromide Books

Leave a comment

Your email address will not be published.

Recent Articles

Mo Yi, Selected Photographs 1988-2003

Mo Yi, Selected Photographs 1988-2003

JTF (just the facts): Published in 2024 by Thames & Hudson (here). Hardcover, 11.3 x 8.9 inches, 192 pages, with 163 images in monochrome and color. Edited by Holly Roussell, ... Read on.

Sign up for our weekly email newsletter

This field is required.