My neighbor has the most beautiful garden ever.
Season after season, she grows the most exotic, gorgeous plants that I could never find in any local nursery. Slightly green with envy over her green thumb, I discovered a glimmer of hope.
There are apps that will identify any plant you take a photo of. Problem solved. Now the rest of the neighborhood is getting prettied up as several houses, including mine, have sprouted exotic new blooms easily ordered online.
Take a photo, get an answer. The most basic form of visual search.
Visual search addresses both convenience and curiosity. If we wanted to learn something more about what we’re looking at, we could simply upload a photo instead of trying to come up with words to describe it.
This isn’t new. Google Visual Search was demoed back in 2009. CamFind rolled out its visual search app in 2013, following similar technology that powered Google Glass.
What’s new is that a storm of visual-centric technologies are coming together to point to a future of search that makes the keyword less…key.
Artificial intelligence and machine learning are the critical new components in the visual game. Let’s focus on what this means and how it’s going to impact your marketing game.
How many kinds of reality do we actually need?
The first thing we think about with the future of visual is virtual reality or augmented reality.
There’s also a third one: mixed reality. So what’s the difference between them and how many kinds of reality can we handle?
Virtual reality (VR) is full immersion in another universe – when you have the VR headset on, you cannot see your actual reality. Virtual reality is a closed environment, meaning that you can only experience what’s been programmed into it. Oculus Rift is an example of virtual reality.
Augmented reality (AR) uses your real environment, but enhances it with the addition of a computer-generated element, like sound or graphics. Pokémon Go is a great example of this, where you still see the world around you but the Pokémon-related graphics – as well as sounds – are added to what you see.
Mixed reality (MR) is an offshoot of augmented reality, with the added element of augmented virtuality. Here, it merges your virtual world with your real world and allows you to interact with both through gestures and voice commands. HoloLens from Microsoft (my employer) is an example of mixed reality – this headset can be programmed to layer on and make interactive any kind of environment over your reality.
The difference is a big fat deal – because an open environment, like HoloLens, becomes a fantastic tool for marketers and consumers.
Let me show you what I mean.
Pretty cool, right? Just think of the commercial implications.
Retail reality
Virtual and augmented reality will reshape retail. This is because it solves a problem – for the consumer.
Online shopping has become a driving force, and we already know what its limitations are: not being able to try clothing on, feel the fabric on the couch or get a sense of the heft of a stool. All of these are obstacles to the online shopper.
According to the Harvard Business Review, augmented reality will eliminate pain points that are specific to every kind of retail shopping – not just trying on the right size, but think about envisioning how big a two-man tent actually is. With augmented reality, you can climb inside it!
If you have any doubt that augmented reality is coming, and coming fast, look no further than this recent conquering by Pokémon Go. We couldn’t get enough.
Some projections put investment in AR technology at close to $30 billion by 2020 – that’s in the next three years. HoloLens is already showing early signs for being a game-changer for advertisers.
For example, if I’m shopping for a kitchen stool I could not only look at the website, but I can see what it would look like in my home:
It’s all about being able to get a better feel for how things will look.
Fashion is one industry that has tried to find ways to solve for this and is increasingly embracing augmented reality.
Rebecca Minkoff debuted the use of augmented reality in her New York Fashion Week show this September. Women could use AR app Zeekit – live during the show – to see how the clothes would look on their own body.
Why did they do this? To fix a very real problem in retail.
According to Uri Minkoff, who is a partner in his sister’s clothing company, 20 to 40 percent of purchases in retail get returned – that’s the industry standard.
If a virtual try-on can eliminate the hassle of the wrong fit, the wrong size, the wrong everything, then they will have solved a business problem while also making their customers super happy.
This trend caught on and at London Fashion Week a few weeks later there were a host of other designers following suit.
Let’s get real about reality
Let’s bring our leap into the visual back down to earth just a bit – because very few of us will be augmenting our reality today.
What’s preventing AR and VR from taking over the world just yet is going to be slow market penetration. AR and VR are relatively expensive and require entirely new hardware.
On the other hand, something like voice search – another aspect of multi-sensory search – is becoming widely adopted because it relies on a piece of hardware most of us already carry with us at all times: our mobile phone.
The future of visual intelligence relies on tying it to a platform that is already commonly used.
Imagine this. You’re reading a magazine and you like something a model is wearing.
Your phone is never more than three feet from you, so you pick it up, snap a photo of the dress, and the artificial intelligence (AI) – via your digital personal assistant – uses image search to find out where to buy it, no keywords necessary at all.
Take a look at how it could work:
Talk about a multi-sensory search experience, right?
Voice search and conversation as a platform are combined with image search to transact right within the existing platform of your digital personal assistant – which is already used by 66% of 18- to 26-year-olds and 59% of 27- to 35-year-olds, according to Forrester Research.
As personal digital assistants rise, so will the prevalence of visual intelligence.
Digital personal assistants, with their embedded artificial intelligence, are the key to the future of visual intelligence in everybody’s hands.
What’s already happening with visual intelligence?
Amazon
One of the most common uses exists right within the Amazon app. Here, the app gives you the option to find a product simply by taking a photo of something or of the bar code:
or
CamFind
The app CamFind can identify the content of pictures you’ve taken and offer links to places you could shop for it. Their website touts the fact that users can get “fast, accurate results with no typing necessary.”
For example, I took a photo of my (very dusty) mouse and it not only recognized it, but also gave me links to places I could buy it or learn more about it.
↓
↓
Pinterest already has a handy visual search tool for “visually similar results,” which returns results from other pins that are a mix of commerce and community posts. This is a huge benefit for retailers to take advantage of.
For example, if you were looking for pumpkin soup recipe ideas and came across a kitchen towel you liked within the Pin, you could select the part of the image you wanted to find visually similar results for.
Google’s purchase of Moodstocks is also very interesting to watch. Moodstocks is a startup that has developed machine learning technology to boost image recognition for the cameras on smartphones.
For example, you see something you like. Maybe it’s a pair of shoes a stranger is wearing on the subway, and you take a picture of it. The image recognition software identifies the make and model of the shoe, tells you where you can buy it and how much it costs.
Captionbot.ai
Microsoft has developed an app that describes what it sees in images. It understands thousands of objects as well as the relationship between them. That last bit is key – and is the “AI” part.
Captionbot.ai was created to showcase some of the intelligence capabilities of Microsoft Cognitive Services, such as Computer Vision, Emotion API, and Natural Language. It’s all built on machine learning, which means it will get smarter over time.
You know what else is going to make it smarter over time? It’s integrated into Skype now. This gives it a huge practice field – exactly what all machine learning technology craves.
As I said when we first started, where we are now with something like plant identification is leading us directly to the future with a way of getting your product into the hands of consumers who are dying to buy it.
What should I do?
Let’s make our marketing more visual.
We saw the signs with rich SERP results – we went from text only to images, videos and more. We’re seeing pictures everywhere in a land that used to be limited to plain text.
Images are the most important deciding factor when making a purchase, according to research by Pixel Road Designs. They also found that consumers are 80% more willing to engage with content that includes relevant images. Think about your own purchase behavior – we all do this.
This is also why all the virtual reality shenanigans are going to take root.
Up the visual appeal
Without the keyword, the image is now the star of the show. It’s almost as if the understudy suddenly got thrust into the spotlight. Are they ready? Will they succeed?
To get ready for keywordless searches, start by reviewing the images on your site. The goal here is to ensure they’re fully optimized and still recognizable without the surrounding text.
First and foremost, we want to look at the quality of the image and answer yes to as many of the following questions as possible:
- Does it clearly showcase the product?
- Is it high-resolution?
- Is the lighting natural with no distortive filters applied?
- Is it easily recognizable as being that product?
Next, we want to tell the search engines as much about the image as we can, so they can best understand it. For the same reasons that SEOs can benefit by using Schema mark-up, we want to ensure the images tell as much of a story as they can.
The wonderfully brilliant Ronell Smith touched upon this subject in his recent Moz post, and the Yoast blog offers some in-depth image SEO tips as well. To summarize a few of their key points:
- Make sure file names are descriptive
- Provide all the information: titles, captions, alt attribute, description
- Create an image XML sitemap
- Optimize file size for loading speed
Fairly simple to do, right? This primes us for the next step.
Take action now by taking advantage of existing technology:
1. Pinterest:
On Pinterest, optimize your product images for clean matches from lifestyle photos. You can reverse-engineer searches to your products via the “visually similar results” tool by posting pins of lifestyle shots (always more compelling than a white background product shot) that feature your products, in various relevant categories.
In August, Pinterest added video to its visual search machine learning functionality. This tool is still working out the kinks, but keep your eye on it so you can create relevant content with a commerce view.
For example, a crafting video about jewelry might be tagged with places to buy the tools and materials in it.
2. Slyce:
Integrate Slyce’s astounding tool, which gives your customer’s camera a “buy” button. Using image recognition technology, the Slyce tool activates visual product recognition.
Does it work? There are certainly several compelling case studies from the likes of Urban Outfitters and Neiman Marcus on their site.
3. Snapchat:
Snap your way to your customer, using Snapchat’s soon-to-come object recognition ad platform. This lets you deliver an ad to a Snapchatter by recognizing objects in the pictures they’ve just taken.
The Verge shared images from the patent Snapchat had applied for, such as:
For example, someone who snaps a pic of a woman in a cocktail dress could get an ad for cocktail dresses. Mind-blowing.
4. Blippar:
The Blippar app is practically a two-for-one in the world of visual intelligence, offering both AR as well as visual discovery options.
They’ve helped brands pave the way to AR by turning their static content into AR interactive content. A past example is Domino’s Pizza in the UK, which allowed users of the Blippar app to interact with their static posters to take actions such as download deals for their local store.
Now the company has expanded into visual discovery. When a user “Blipps” an item, the app will show a series of interrelated bubbles, each related to the original item. For example, “Blipping” a can of soda could result in information about the manufacturer, latest news, offers, and more.
Empowerment via inclusivity
Just in case you imagine all the developments are here to serve commerce, I wanted to share two examples of how visual intelligence can help with accessibility for the seeing impaired.
TapTapSee
From the creators of CamFind, TapTapSee is an app specifically designed for the blind and visually impaired.
It recognizes objects photographed and identifies them out loud for the user. All the user needs to do to take a photo is to double tap on the devices’ screen.
The Seeing AI
Created by a Microsoft engineer, the Seeing AI project combines artificial intelligence and image recognition with a pair of smart glasses to help a visually-impaired person better understand who and what is going on around them.
Take a look at them in action:
While wearing the glasses, the user simply swipes the touch panel on the eyewear to take a photo. The AI will then interpret the scene and describe it back out loud, using natural language.
It can describe what people are doing, how old they are, what emotion they’re expressing, and it can even read out text (such as a restaurant menu or newspaper) to the user.
Innovations like this are what makes search even more inclusive.
Keep Calm and Visualize On
We are visual creatures. We eat first with our eyes, we love with our eyes, we become curious with our eyes.
Cameras as the new search box is brilliant. It removes obstacles to search and helps us get answers in a more intuitive way. Our technology is adapting to us, to our very human drive to see everything.
And that is why the future of search is visual.
Aside from taking advantage of existing technology to meet the keywordless world more visually, as I mentioned above, what do you think marketers can do to prepare for this world? How will you adapt your strategies for the picture-centric consumer?
It's sad to see but we are becoming like the "Humans" in Wall-e... Just trying to find virtual realities to avoid our own reality. The funny fact is that if someday we have a problem and have no electricity, we won't be able to "survive".
What i approve is to adapt "virtual engines" to work and play. It make some types of works more easier.
The problem is that every "virtual reality" has a different name and different use.
In my opinion this is not a good thing to try to enter this economic market because there are Corporations like Google, Facebook, Microsoft or Sony that are investin millions of dollars on it, and everything they will do, they will do better than anyone of us ...
Better focus on the keyvords they don't use or the ones they missed but that's not something i will invest for.
Camille :).
You mean like "Human robots"? lol
I though the same when i saw the "facebook convention and everyone with the "Virtual Glasses"..
Exactly, why you want a dog that eat and you have to take care and spent money and time if you can have a virtual dog that didn't die and you can play with him only when you want...
What is going to be next? Your virtual family?...
The world became crazy long time ago :/ but with a technology-addiction world, it's even easier to control people...
It's really early today, but I think that maybe with a mix of beacons + visual recognition we can do our best maybe in 5 years ;)
Hi Sergio, that's an excellent point! I'm so excited to see how this would work in conjunction with beacons. I'm getting all fired up thinking about the possibilities. Being a marketer in this constantly evolving phase is wonderful :) Thanks so much for reading!
What do you call the contact lens just like what Brad Pit used in Mission Impossible 5???
As the article says, the Alt and the title attributes seems to be a very important point in this situation, Google is going to make us work for it, for getting the details and data, google will collect that user data (seos, webmasters) and can develop with their databases a way to serve results by images, but i think is because of the TEXT data not because a "magic of the images".
Hi webtematica! I totally hear you on the text elements-- do know that they are but one aspect. A lot of the tech and image search-only bots involve actual image recognition, so no keywords necessary at all.
e.g. Captionbot.ai. You wouldn't need to have any words associated with it, and the AI will learn to look at and understand the image in a similar way humans would.
Hence, it's still important to have images that are of high-quality in good lighting.
I just try Captionbot.ai and i put a Panamá flag, the app told me that think is an umbrella. xDDDD, anyway it´s ask me as good as it did, asking me to rate it. That probably be a "text" way to put in their database?
Purna, I really love how much meat there is to this post. Seeing the title I sort of had a "Simpsons Did It" moment thinking back to how revolutionary Google Goggles seemed at the time and how unfulfilled those promises have turned out to be, but you've done a spectacular job of showing that there are many firms advancing this frontier in useful and sometimes unexpected ways.
I'm curious if the Pinterest utility highlighted above ends up detouring users away from their original intent. I'm sure it serves a goal of keeping the user within Pinterest longer, it might be interesting to see if there's anyone with a good strategy to use this for discovery based on popular influencer (or even competitor) pins.
I think the key to the success of many of these apps and utilities is the volume/frequency of new scenarios the user finds themselves in. I particularly identify with the example you gave because I've been seeing lots of folks on the subway and walking around NYC with cool sneakers but would never stop them to ask where they got them. Likewise, anyone who travels can understand wanting to get more info easily while out discovering a new place.
However, if you're someone who lives a fairly routine life and isn't in new places or around new people very often, does visual search provide as much added utility? I'm not sure that taking a picture of something in a familiar setting provides a better UX than what is presently available. Until that can be unlocked, I'd venture that any viable future in these genre will come from being integrated into the OS or being embedded in an app that already has a large enough user base to reach the tipping point.
Hi Ryan, Firstly thank you so much for your kind words, it really makes my day :)
Re: your last point, certainly that is true in many many cases. What we see now with tech is that it is solving things we may not immediately realize we had a need for. E.g. Twitter on Amazon Echo, which makes Alexa read out your texts to you. At first, I didn't think I'd use it much but turns out it is convenient.
Which is also the case with using your camera on the phone. It's super simple, easy and always with us. If one wanted to order a particular brand of product, all you had to do was take a pic of something and then Amazon brings it up. Easier than typing. It won't replace typing, but simply provides another convenient option.
Therein lies its path to success. Plus, think about the large retailers already using it now-- it's the most popular brands in the country. If we get used to them offering this option, then will we expect this to be the "norm" everywhere?
We already try to turn offline shopping into online-- 90% of us use our phones in stores according to a study published on Marketing Land. The main uses are to find deals, reviews or more information. There's the convenience angle again.
I'm super excited to watch and see how this grows-- happy to chat about this anytime!
Very interesting information Purna! That Cortana video is amazing! I'm still wondering how they manage to recognize what you are showing out of a simple picture. I also wonder what if they had some kind of always-activated programme so that they can recognize all you see with your phone even if you are just using it in a park... I mean, could Google or Microsoft be getting all the information they can from our mobile camera?
I heard they are doing it with the microphone so would think they could do it with the camera as well...
As always when we go farther with these new technology steps I'm always worried about my privacy how that¡s managed.
Thanks for sharing the info.
BR!
David
Thank you so much David!
How they recognize what we're showing is the power of visual search...and that's truly why we need to ensure we have good quality images for the visual search engines to recognize and make matches. We see it happen with the Amazon app today too right? It's common and will increasingly become more widely used.
I really appreciate you taking the time to read and comment.
The Hololenses and Oculuses of the world are a ways off before they are going to be ubiquitous in everyday life. Why? The hardware. Until the hardware is both inexpensive as well as seamlessly integrates into the world, as in not being big and bulky, or uber-geeky (cough, cough... Google Glass), then AR and VR are going to be used for short term, niche uses.
The near term future belongs to the smartphone, because it is ubiquitous, and seamlessly fits into society already.
I love the term "keyword-less search"! It's really opened my eyes more to the transition that is happening in search. It's like the internet is reinventing itself (or maybe "broadening" itself is a better term). As SEOs, we're going to have to change and either reinvent or broaden ourselves (again).
It amazes me how it is changing the landscape of reality virtrual and all other existing on the market. More and more and more curious developments app and think it's a good alteranativa for powerful and strong businesses.
My question to the community Moz is: do you think that a small business can actually implement these things? Personally I think so, but should take good care of every aspect because a crappy app can do more harm to the brand strengthened. What do you think?
Hi Enrique,
It's always good to be inspired by things on the grand scale and then see how they can be distilled down to what can work best for you. AR/VR will become more commonly used in the future, though at the moment any business, regardless of size, can take advantage of just image search and image recognition advances. Thanks so much!
I can't wait until this technology is more advanced and refined. Visual intelligence can be very buggy right now. I imagine this will improve as our camera phones become more sophisticated.
Hi David, I can imagine it's just going to get better...the fact that so much of it exists and is so common (i.e. Amazon search by image or Slyce) is actually a sign that it's coming sooner than one might think. Thanks so much for taking the time to comment.
Most of us in this game still think of the world in terms of the written world. I know I do and I'm afraid on the whole that's how I prefer to consume content. But we need to be thinking increasingly about how we use the real world around us and incorporate thoughts about that into what we're doing. But this is much less about static images - it's about how video, words and images can be brought out into the engage someone and root their search in the reality that's in front of them. I think that's going to be a huge challenge - because it's a massive undertaking!
Hi Simon, yes that's certainly going to be the case for the world of virtual or augmented reality. How exciting is it to be working in a world where so many cool things from sci fi movies are becoming a reality?
In the meantime though, the humble static image can still provide a lot of value to retailers as the audience will search for things with images directly as well. We see it happening already, it's going to pick up pace.
Thanks so much for taking the time to comment!
I love the idea of visual search - but doesn't that also mean I would never have to talk to another human ever?
With the right hardware this would make life so easy, especially for us visual people who struggle with words.
Hi Christina,
I'm so glad you like the idea of it-- I don't believe it will replace human conversation. it's simply another, more convenient way to search for and discover things we're interested in.
I agree, it will make things a lot easier for visual people. Thank you so much for taking the time to read and comment!
Penguin 'pushed' the web from simple words i.e. descriptions or even KW stuffing toward stories, toward the interesting content... How does one tell a story with (static) pictures? I can only think of a comic book or a similar sequence of pics. Now the reason why this format is neglected is that video is more comfortable, easy to consume... Perhaps such a sequence could be the way to put a meaningful wrapper around static pics
This article/page is the perfect example of everything you discussed. That's why you came up first in my search result. My search query was kinda dumb, but Google figured out what I was trying to search for and brought me to this page.
It's interesting and all but if you have pcitures on your website where product can not be recognized, you've got a lot more problems than not "qualifying" to be found by these apps. Most of them, by the way still use keywords to search, they just pick them for you, so it's not actually keyword-less.
Hi Igor,
I agree, if there are pictures where the product can't be recognized, that would be a problem for the business. While some of the current search engines may still factor in the texts, we're seeing a rise of AI which is purely image recognition. No words needed, hence we should be preparing ourselves by ensuring our images are of high quality and easily recognizable by the AIs.
Thanks so much for taking the time to comment!
Virtual reality has come to the general public to stay, we see as giants like Google, Microsoft, Sony or Facebook are betting very strong by this technology will be the real revolution in the coming years. Great post.
Great article. We have an interesting future ahead.
I am very interested on the applications you mentioned of augmented reality for retail. You briefly mentioned the possibility to feel the texture or even the weight of the items. Is there any viable solution yet? and if not do you know who is working on it's development and what are their expectations?
Hi Alejandro! Ahh, sorry. I mentioned that a problem with online retail is being unable to feel the weight or texture of the items. So far there is no technology that can let you replicate that.
What we can do with AR/VR?holograms is be able to gauge how things will look in our environment. HoloLens is open platform, you can look to develop for that.
I'm with you though, I'm waiting for the day when we can pass chocolate through the TV like we say in the Charlie and the Chocolate Factory movie :)
Thanks so much for your thoughtful note!
Thank you so much "Purna" that you picked different topic for digital marketing. People easily understand visuals compare than keywords. Visuals are easy to keep remember in the mind. Now these days, video marketing, info graphics and captured fresh images are creating huge sound in the search industry. Google take unique things seriously and keep you first in the search results.
Technology is really growing, not that surprised by this info. But this is a good post, learned something from it,