Soundscaping the world with Amos Miller

Published

Product Strategist Amos Miller
Product Strategist Amos Miller

Episode 54, December 12, 2018

Amos Miller is a product strategist on the Microsoft Research NeXT Enable team, and he’s played a pivotal role in bringing some of MSR’s most innovative research to users with disabilities. He also happens to be blind, so he can appreciate, perhaps in ways others can’t, the value of the technologies he works on, like Soundscape, an app which enhances mobility independence through audio and sound.

On today’s podcast, Amos Miller answers burning questions like how do you make a microwave accessible, what’s the cocktail party effect, and how do you hear a landmark? He also talks about how researchers are exploring the untapped potential of 3D audio in virtual and augmented reality applications, and explains how, in the end, his work is not so much about making technology more accessible, but using technology to make life more accessible.


Episode Transcript

Amos Miller: Until you are out there in the wind, in the rain, with the people, experiencing, or at least trying to get a sense for the kind of experience they’re going through, you’ll never understand the context in which your technology is going to be used. It’s not something you can imagine, or glean from secondary data, or even from video or anything. Until you are there, seeing how they grapple with issues that they are dealing with, it’s almost impossible to really understand that context.

(music plays)

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Amos Miller is a product strategist on the Microsoft Research NeXT Enable team, and he’s played a pivotal role in bringing some of MSR’s most innovative research to users with disabilities. He also happens to be blind, so he can appreciate, perhaps in ways others can’t, the value of the technologies he works on, like Soundscape, an app which enhances mobility independence through audio and sound.

On today’s podcast, Amos Miller answers burning questions like how do you make a microwave accessible, what’s the cocktail party effect, and how do you hear a landmark? He also talks about how researchers are exploring the untapped potential of 3D audio in virtual and augmented reality applications, and explains how, in the end, his work is not so much about making technology more accessible, but using technology to make life more accessible. That and much more on this episode of the Microsoft Research Podcast.

Host: Amos Miller, welcome to the podcast.

Amos Miller: Thank you. It’s great to be here.

Host: You are unique in the Microsoft Research ecosystem. Your work is mission-driven. Your personal life strongly informs your professional life and, we’ll get more specific in a bit. But for starters, in broad strokes, tell us what gets you up in the morning. Why do you do what you do?

Amos Miller: I’ve always been passionate about technology from a very young age. But, really, in the way that it impacts people’s lives. And it’s not a mission that I necessarily knew about when I went through my career and experiences with technology. But when I look back, I see that those are the areas where I could see that a person feels differently about themselves or about the environment as a result of their interaction with that technology. That’s where I thought okay, that is having meaning to this person. And I have this huge, wonderful opportunity to do what I do in Microsoft Research to actually have turned that passion into my day job, which is very… I feel extremely fortunate with that. And I sometimes have to pinch myself to see that it’s not a dream.

Host: Well, tell us a little bit about your background and how that plays into what you are doing here.

Amos Miller: I’m very much a person that grew up in the technology world. I also moved a number countries over my career, and my life. I grew up in Israel. I spent many years in the UK, in London. I spent a few other years in Asia, in Singapore, and now I’m here, so all of these aspects of my life have been very important to me. I also happen to be blind. I suffer from a genetic eye condition called retinitis pigmentosa. It was diagnosed when I was five and I gradually lost my sight. I started university with good enough sight to manage and finish university with a service dog and any kind of technology I could find to help me read the whiteboard, to help me read the text on the computer. And I’d say by the age of 30, I totally stopped using my sight. And that’s when I really started living life as a fully blind person.

Host: Let’s talk about your job for a second. You are a product strategist at Microsoft Research, so how would you describe what you do?

Amos Miller: So, I work in a part of the organization at Microsoft Research that looks at really transferring technology ideas into impact. Into a way that they impact business, impact people. A good idea will only have an impact when it’s applied in the right way, in the right environment, so that the social, the business, the technological context in which it operates is going to make it thrive. Otherwise it doesn’t matter how good it is, it’s not going to have an impact.

Host: Right. So, let’s circle over to this previous role you had which was in Microsoft’s Digital Advisory program. And I bring it up in as much as it speaks to how often our previous work can inform our current work, and you referred to that time as your “customer-facing life.” How does it inform your role as a strategist today?

Amos Miller: What always energizes me is when I see and observe the meaning and the impact that technology can really have for people. And I don’t say it lightly. Until you are out there in the wind, in the rain, with the people, experiencing, or at least trying to get a sense for the kind of experience they are going through, you’ll never understand the context in which your technology is going to be used. It’s not something you can imagine, or glean from secondary data, or even from video or anything. Until you are there, seeing how they grapple with the issues that they are dealing with, it’s almost impossible to really understand that context. And the work that I’ve done in, actually, my first nine years in Microsoft, I worked in a customer-facing part of the business, in the Strategic Advisory Services, today known as the Digital Advisory Services. It’s work that we do with our largest customers around the world to really help them figure out how they can transform their own businesses and leverage advancements in technology.

Host: Right. So now, as you are working in Microsoft Research, as a product strategist, how does that transfer to what you do today?

Amos Miller: First of all, I want to introduce, for a moment, the team that I work with, which is the Enable team in Microsoft Research. And the Enable team is looking at technological innovations, especially with disabilities in mind. In our case, our two primary groups are people with ALS and people who are blind. As a product strategist, my role is to work across the research, engineering, marketing and our customer segment and really figure out and understand how we can harness what we have from a technology perspective and, as an organization, to maximize and have that impact that we aspire to have with that community. And that takes a great deal of – again, going back to my earlier point – spending time with that community, going out there and spending time, in my case, with other people who are blind because I only know my own experience. I don’t have everybody else’s experience. The only way for me to learn about that is to be out there. And in our team, every developer goes out there to spend time with end users because that’s the only way you can really get under the covers and understand what’s going on.

Host: Right.

(music plays)

Host: So, the website says you drive a research program that “seeks to understand and invent accessibility in a world” – this is the fun part – “where AI agents and mixed reality are the primary forms of interaction.” It sounds kind of sci-fi to me…

Amos Miller: A little bit. Let me unpack that a little bit. When we traditionally think about accessibility, we think about, how do you make something accessible? So how do you make a microwave more accessible? Well, there isn’t anything inherently inaccessible in putting a piece of pizza and warming it up in the microwave. The only reason it’s inaccessible is because the microwave was designed in an inaccessible way. It could have been accessible from the beginning.

Host: Sure.

Amos Miller: But the world we are moving to is, it’s not about me operating the microwave, it’s not about the accessibility of the microwave, it’s about me preparing dinner for my family. That’s the experience that I’m in. And there’s a bunch of technologies that support that experience. And that experience is what I am seeking to make an accessible and inclusive experience.

Host: Okay.

Amos Miller: That means that we are no longer talking about the microwave, we are talking about a set of interactions that involve people, that involve technology, that involves physical things in the environment. It’s not about making the technology accessible, it’s about using technology to make life more accessible, whether you are going for a walk with a friend, whether you are going to see a movie with a friend, whether it’s sitting in a meeting and brainstorming a storyboard for video. All of these are experiences, and the goal is, how do you make those experiences accessible experiences? That kind of gets you thinking about accessibility in a very different way, where your interaction is with the person that you are sitting in front of. The technology is just there in support of that interaction.

Host: Right. As I’m researching the interview, I’m find myself thinking of the various solutions – maybe the “technical guide dog” mentality – like let’s replace all these things, with technology, that people have traditionally used for independence. And the technology as it enters that ecosystem, some people might think the aim is to replace those things, but I don’t think that’s the point of what’s going on here. Am I right?

Amos Miller: That’s right. There is a tendency, when you come at a problem with a technology solution, to look at what you are currently doing and replace that with something that’s automatic. Right? Oh, you are using a guide dog? How can I replace that guide dog and give you a robot? So, I work on technology that enhances mobility independence through audio and sound, which we’ll talk about in a minute.

Host: Right.

Amos Miller: But often people ask me, how would that work for people who can’t hear? And the natural inclination to them is to say, oh, okay, well you’ll have to deliver the information in a different way. The thing is that people get a sense of their space and their surroundings using the senses that they have. To me, the question is not, how do we shortcut that? It’s how do they sense their space today? They do. They don’t sit there feeling completely disconnected. And if you are going to intervene in that, you better be consistent with how they’re experiencing it today.

Host: Yeah, and that leads me right into the next question because you and I talked earlier about the fundamental role that design plays in the success of human computer interaction. And I’m really eager to have you weigh in on the topic. Let’s frame this broadly in terms of assumptions. And that’s kind of what you were just referring to.

Amos Miller: Yeah.

Host: You know, if I’m looking at you and I think, well my solution to how you interact with the world with technology would be Braille, that’s an assumption. So, I’m just going to give you free reign here. Tell us what you want us to know about this from your perspective.

Amos Miller: We all make assumptions about other people’s other people’s experience of life. You are referring to Bill Buxton who was on your podcast a few weeks ago.

Host: Right.

Amos Miller: And he’s actually been a very close friend and mentor throughout the work that we are doing on Soundscape, which we’ll talk about in a minute. And he’s really brought to our attention that what we’ve done, of going out there and experiencing the real situation that people are experiencing, is about empathy and it’s about trying to understand and probe ideas that challenge your assumptions about what effect they will have. But, really seeing, observing and understanding their experience in that particular situation, and then maybe applying, from your learning, some form of intervention into that experience and observing how that affects that experience. It doesn’t have to be a complete piece of software or technology, it’s just an intervention. It can be completely low-fi. That helps you to start expanding your understanding. And you don’t have to do it with 100 people. Do it with two… three people. You will discover a whole new world you didn’t know about. I’m sorry, but you don’t need 200 data points to support that experience, you’ve just seen it. And you can build on that. So, can you enhance that, in any way, to give them an even richer awareness of their surrounding? And those are the kind of questions that taking design through that very experiential lens has led us to the work that we are actually doing our work on Soundscape, which is the technology that we’ve been developing over the last few years, to really see how far we can take this notion of how people perceive the world and how you can enhance that so their perception is enhanced.

(music plays)

Host: Well, let’s talk about 3D sound and an exciting launch earlier this year in the form of Microsoft’s Soundscape. This is such a cool technology with so many angles to talk about. First, just give our listeners an overview of Soundscape. What is it, who is it for, how does it work, how do people experience it?

Amos Miller: Soundscape is a technology that we developed in collaboration with Guide Dogs, certainly in the early stages, and still do. And the idea is very much using audio that’s played in 3D. Using a stereo headset, you can hear the landmarks that are around you and you can, thereby, really enrich your awareness of your surroundings, of what’s where in a very natural, easy way. And that really helps you feel more independent, more confident, to explore the world beyond what you know.

Host: How do you hear a landmark?

Amos Miller: How do you hear a landmark? So, for example, if you are standing and Starbucks is in front of you and to the right, we will say the word Starbucks, but we won’t say it’s in front of you and to the right, it will sound like it is over there where Starbucks is.

Host: Oh.

Amos Miller: OK? And that’s generated using, the technical term is head rotation transfer of synthetic binaural audio. So, it’s work that actually was developed in Microsoft Research, over a number of years, by Ivan Tashev and his team. And effectively, you can generate sound to make it sound like it’s not in between your ears. You can hear it as though it’s out in the space around you. It’s really quite amazing. And we also use non-audio cues. For example, one of the ideas that we built into Soundscape is this notion of a virtual audio beacon. Not to be confused with Bluetooth beacons! It’s completely virtual. But let’s suppose that you are standing on a street corner and you are heading to a restaurant that’s a block and a half away. What you can do with Soundscape is play some audio beacon that will sound like it’s coming from that restaurant, so no matter which way you’re standing, which way you’re heading, you can always hear that “click-click” sound so you know exactly where that restaurant is. You can see it with your ears.

Host: How do you do that? How do you place a beacon someplace, technically?

Amos Miller: Binaural audio is when you have a slightly different sound in each ear which tricks the brain into having a sense of, that sound is three dimensional. It’s exactly the same way that 3D images work. Audio works almost the same. If Ivan was here, he’ll say it’s not exactly the same, but by generating a slightly different soundwave in each ear, you’re able to make sound, sound like it’s coming from a specific direction. But by playing it in each ear slightly differently, it will actually sound like it’s coming from in front of you and to the right. OK? Now how do we know where to place that beacon?

Host: Right.

Amos Miller: At present, we – it’s largely designed to be used outdoors – so, we use GPS, so we know where you are standing. We know where that restaurant is, so we have two coordinates to work with. We also estimate which way you are facing. So, if you were facing the restaurant, we would want to play that beacon right in front of you. If you were standing at 90 degrees to the restaurant, we’d want to make that beacon sound like it’s coming not only from your right ear, but 100 meters away to your right.

Host: Unbelievable…

Amos Miller: Yeah? And so, taking all of those sensory inputs and taking the information from the map, the GPS location, the direction, we reproduce the sound image in your stereo headset so that you can hear the direction of the sound and where the thing is. And the most amazing thing is, this is all done in real time, completely dynamic. So, as you walk down the street, that restaurant may sound in front of you at 45 degrees to your right, and as you progress, you’ll hear it getting closer and closer and further and further to your right and further and further to your right. And if you overshoot it, it’ll start to sound behind you a little bit, yeah? Now, why is this so important? Because I’m not going to the restaurant on my own. I’m there with my kid or with my wife, or with my friend. And, if I were to hold a phone with the GPS instructions and all of that, I can’t hold a conversation with that person at the same time because I’m so engaged with the technology. And we talked earlier about, how do you get technology to be in the background? That beacon sound is totally in the background. You don’t have to think about it, you don’t have to attend to it mentally, it’s just there. So, you know where the restaurant is, and you continue to have a conversation with the person you are with, or you can daydream, or you can read your emails, listen to a podcast, and all of that happens at the same time. Because it’s played in 3D space, because it’s non-intrusive. You minimize the use of language. And all of these subtle aspects are absolutely crucial for this kind of technology to be relevant to this situation. You’re not sitting in front of the computer and it’s the only thing you are doing. You are outdoors. There’s a ton of things happening all the time that you have to deal with. You can’t expect the person to disassociate themselves from all of that. You know, Soundscape is one way of addressing this very, very interesting and important question. Throughout history, technology has always changed the way that we do things. But I think that we’re starting to see that, as technology developers, we really have to be much more mindful about just from the subtleties of how we design something on, what is the relationship between the technology and the person in that situation? How can a technology do exactly the same as it has done, but do so in a way that makes the person feel empowered and develop a new skill. Great runners learn to feel their heartbeat. But if they have a heart monitor, they’ll stop feeling that heartbeat because the device on their wrist tells them what it is. Well, that’s only because that’s how it was designed. If the heart monitor, instead of telling you, you are at, I don’t know, 150, it’d say, what do you think you’re at? And you’d say, oh, I’m at 140, and it’ll say, oh, you are actually at 150. You will have learned something new from that. It’s exactly the same function, but you have developed yourself as a result of that interaction. And I think that that’s the kind of opportunity that we need to start looking for.

Host: I want to circle back to this 3D audio and the technology behind it, and something that you referred to as “the cocktail party effect.” Can you explain that a little bit and how Microsoft Research is sort of leading the way here?

Amos Miller: The cocktail party effect is an effect, in the world of psycho-acoustics, that is very simple. If you imagine you’re sitting around a table in a cocktail party having a very exciting conversation with somebody, and there are lots of other similar conversations happening around you at the same time, because all of those conversations are happening in 3D space, you are actually able to hear all of those conversations even though you are attending just to yours. You are listening and you can understand and engage in your conversation, but if your name came up in any of those other conversations, you’ll immediately turn your head and say, hey guys, what are you talking about there? And that’s an incredible capability of the brain to manage a very rich set of inputs in the auditory space that is very much under-utilized today in the technology space. We always feel that if we need to convert something into audio, it’s got to be sequenced, because we can only hear one thing at a time. When it’s in 3D, that’s no longer the case. And that’s a huge opportunity. We play a lot of that in VR and augmented reality and we spend a lot of time on the visual aspect of virtual reality and really pushing the envelope on how far we can take the use of immersive experiences in objects in all directions. But the same is available with audio. Even more with audio because your eyes are no longer engaged. Audio is in 360. If we block our ears for a moment, all of a sudden, our awareness level drops. But we are so unaware of the power of audio because vision just takes over everything. And I think the work that we have done, both in the acoustic work on 3D audio, and the application, especially in the disability space where we placed the constraints on the team – there is no vision, now let’s figure it out – and that leads to new frontiers of discovery and innovation in this space that I think could be applicable and would be applicable in many other spaces. And that, you know, that heads-up experience when you are out and about in the streets, not focused on the screen, but engaged in your surroundings. And that’s a perfect situation where audio has huge advantages that we can look at.

(music plays)

Host: I ask each of my guests some version of the question, what keeps you up at night? Because I’m interested in how researchers are addressing unintended consequences of the work they’re doing. Is there anything that concerns you, Amos? Anything that keeps you up at night?

Amos Miller: I think things keeps me up at night because they are so interesting and yet unsolved. You know, we talked a bit about, how do you really express and portray the physical space around you in ways that utilize your other senses and really maximize the ability of the brain to make sense of places without vision? And I really think that, with Soundscape, we’ve only started to scratch the surface of that question. Over half of the brain is devoted to perception. And I think that, when we find ways to really engage, even further engage that incredible human capability, we will discover a whole new frontier of machine and human interaction in ways that we don’t understand today.

Host: You said you arrived at Microsoft Research from “left field.” What’s your story on how you came to be working on research in accessibility at Microsoft Research?

Amos Miller: I started life as a developer, and I did a business degree and joined the Strategic Advisory Services in Microsoft Consulting in the UK. And I think it was a very special moment in Microsoft, over the last few years, when we really started to understand the meaning of impacting every person on the planet with technology and seeing that as our mission. And that led to a series of conversations that opened an opportunity for us to actually get behind that statement and we basically joined Microsoft Research through that mission, through the work that we’re doing in Soundscape. And because we already had very strong relationships, thanks to some wonderful people in the company, and strong relationships here in Microsoft Research and in other parts of the company.

Host: Before we close, Bill Buxton asked me to ask you about the kayak regatta that you organized.

Amos Miller: Uh huh. Oh, we didn’t talk about that.

Host: Just tell that story quickly because I do have one question I want to wrap up with before we go.

Amos Miller: Okay. Well we talked about Soundscape as a technology that really enables you to hear signals in 3D around you. And that was largely designed to be used in the street, right? And then we thought, what would happen if we placed that audio beacon on a lake? So, we got a bunch of people during the summer hackathon and said, okay, well let’s try it out. So, we organized an event on Lake Sammamish. We hacked Soundscape to work on the lake and placed some virtual audio beacons around the lake and invited a group of people who are blind to come and kayak with us and see how they enjoy it. And they absolutely loved it. And I think that was a real eye-opener for us. You have to understand the difference here, you know? Could they kayak before? Sure, no problem, because a sighted person would be with them and tell them, okay, now you go straight, now you row left… But I’m sorry, that’s a very boring experience. You are not in control, you are not independent, you are just doing the work. And by being able to hear where those beacons are, you are truly in the driving seat. And that is a sense of independence that we’ve not really seen to that extent before we did this event.

Host: I like how you called it an eye-opening event!

Amos Miller: It was!

Host: There are so many metaphors about vision that we just sort of take for granted, right?

Amos Miller: Maybe it’s because I have prior sight, maybe not, but I, first of all, I use those metaphors all the time, and I also feel, you know, I could close my eyes and feel that my eyes are closed and open them and feel that they’re open. And I definitely take everything in in a very different way, even though the eyes don’t actually do the scientific aspect of what they’re designed to do.

Host: As we close, I always ask my guests to offer some parting advice to our listeners whether that be in the form of inspiration for future research or challenges that remain to be solved or personal advice on next steps along the career path, whether you have a guide dog with you or Soundscape… What would you say to your 25-year old self if you were just getting started in this arena?

Amos Miller: I honestly would say, get real life experience. Especially in the areas that you are passionate about. Be passionate about them with even more energy and see the work that you do in the context of what you are passionate about. Because you can only really apply your personal experiences to what you do. It’s so great here, in Microsoft Research, to see the interns coming here in the summer. And the creativity and passion, and new perspectives that they bring to our work here. And there’s a little bit of a side of me that worried they’ll jump into the job before they went out and explored the world. And I think it’s important that they find a way to do something that gives them that meaningful context to the work that they’ll be doing here.

(music plays)

Host: Amos Miller, thank you for joining us today. It’s been – can I say it? – an eye-opening experience!

Amos Miller: Sure. My pleasure. Thanks so much for having me.

To learn more about Amos Miller and the latest innovations in audio, sound and accessibility technology, visit Microsoft.com/research

Continue reading

See all podcasts