You've got some great questions!
Mage used clairaudience to listen to a guard talking on his commlink in the next room.
I can already tell this is going to be gnarly...
Here's the question: should there be a test for this? I thought perhaps yes, because I'd normally have a perception test to hear fine details of an overheard conversation. Player argued no (and furthermore that they should hear both sides of the conversation), since they could place the "center" of the clairaudience effect directly on the commlink to get "right there" hearing. (Mage also had clairvoyance running and could use that to help specifically locate the clairaudience "effect bubble", FWIW.)
Ok. As a general principle, there's ALWAYS a test when it comes to spells. Of course, Clairaudience and Clairvoyance don't explicitly spell out what any net hits gives you. In cases like this, the go-to answer is "look at what the rules for that category of spells says". In the case of Detection spells, you've got that chart right there spelling out what kinds of info you can get from a detection spell when you have X, Y, and Z # of net hits. Going by that, I'd say the caster will have needed at least 3 or 4 net hits on the spellcasting test in order to get "good enough quality remote hearing" to permit picking that level of detail.
However, the description of the spell is that you gain remote hearing. You MIGHT instead go with having the beneficiary of the spell make a Perception (Hearing) test to pick out what's being said on the commlink. Explicitly with no bonuses from augmentations (which a spell is, lol, but I guess we have to ignore that) and since cyber hearing augmentation can't help, then surely neither should no-essence-paid earbuds should help, either. But, as much sense as this might make at first blush, it's got the inconsistency problem (no augmentations, lol) as well as making net hits for a spell meaningless... and from a game balance/game design point of view, that should never be true.
Second question: if there is to be a test, is it a test based on the spell (comparing the guard's B+W or the commlink's Object Resistance to the original spellcasting hits), or a perception test as if the mage was using normal senses through the clairaudience tunnel?
IIRC in the moment I did the magic vs. guard's B&W on the principle that there should be a test for gaining important information and the mage was using a magical effect to intrude on the guard's "personal mana space". But on success, I did provide both sides of the conversation. Not sure whether that's really in line with the intent of the spell, though.
6e did not bother covering rules-territory on the topic of "what is a target of a spell" vs "what is a subject of a spell". And to my continued chagrin, neither did Street Wyrd. So, here we are just winging it.
You could, imo, very reasonably argue that as a Mana spell, Clairaudience simply cannot hear what's coming out of a technological commlink. At the same token, you could argue that Clairvoyance cannot see what's being displayed on a video screen for the same reason. But since there's no real parsing between targets and subjects, the player could also very reasonably counter that "Mana spell only means the beneficiary of the spell has to be a living creature, not that everything the spell interacts with has to be living creatures". I would counter that with a whole discussion about how targets and subjects differ, but as we said that's not really covered by the rules... and who cares what *I* think. I don't even play at your table

It's a shame when the rules don't cover something they could have (or even should have, imo) but when that happens the GM just has to make a call and move on. I think your instincts were correct: when a spell involves a hostile or unwilling participant, there should ALWAYS be some kind of roll necessary. 9 times out of 10 that roll should be opposed, but in a case like Clairaudience/Clairvoyance, I can honestly see the unopposed success test being appropriate.
Related question: can you sustain clairvoyance/clairaudience while walking around, on the principle that you can continually spend minor actions to swap the effect between "see where I'm going" and "peer inside the nearby building"? I thought with some practice, maybe it would be fine while walking slowly, but require a Major Action while in combat / running / etc. because it would be a bit disorienting / take a lot of your attention to keep yourself in the zone of view in that case. Again, not really clear if that's reasonable, though. (The mage wanted to case once and sustain and move around vs casting over and over again to get a new section of the building in view.)
That's basically the implicitly intended purpose of those spells: remote recon. Who needs drones or a rigger, right? (sadface)
A couple of things to keep in mind:
1) as you imagined, the action economy involved in zooming the focus around makes it pretty impractical for combat and other "every second counts" situations.
2) When the team has a few minutes to allow the mage to throw their sight/hearing through a facility, feel free to decide that this kind of focus is more intense than sustaining other spells, and rule that it cannot be sustained with Focused Concentration while actively moving the spell around. make them suck the -2 dice no matter what. Or even up that to -3 dice. It's your game man.
3) Wards are fairly cheap. And since there are no "MAGIC ISN'T REAL!" types in paying jobs in the Sixth World, anyone who's in charge of building security knows to use them. Mana Barriers are gonna stop detection spells.
4) Remember there's rules for regular, ordinary people noticing that there's magic going on around them. (pg. 129, Noticing Magic) If the magic is being centered right between a mundane's ear and his commlink, I'd even give Edge for that perception roll. Maybe even lower the threshold, because people's ears are REALLY sensitive, man. Don't touch those if you want to be subtle.