Best to do both. Big differences can be noticed quickly and are obviously most important
subtle differences may require more time but are more prone to mistakes. Optimal set up for the speakers will differ and if you don’t get them both set up properly the difference you perceive may result from suboptimal set up, not something inherent in the speaker. Level matching, etc can also be an issue, as can expectation bias.
As others have suggested, have a set group of songs that you are familiar with to use for the test. I would suggest some that are very well recorded, and some not so well recorded
good luck