Finding a path with the least overall noise from quantum memories, fibers, and gate operations in a quantum network involves the challenge of acquiring knowledge of these noises, their sources, and the time of occurrences. In this paper, we propose a reinforcement learning-based route selection approach that uses a multi-arm bandit algorithm to find the least noisy path from a transmitter (Tx) to a receiver (Rx), without considering any information on qubit decoherence due to probabilistic noises inherent in quantum memories and imperfect gate operations. It only uses network deployment knowledge to find a set of feasible paths from Tx to Rx. We provide a key finding from a network design perspective which says that performing entanglement swapping on nodes within a path in a non-synchronized and parallel manner not always reduces the decoherence experienced in achieving the end-to-end entanglement on that path. Further, we design and open-source a new simulator for simulating probabilistic noises encountered during entanglement distribution between Tx-Rx on a path, which has supporting callable functions for connecting the unknown network environment required for interaction with the multi-arm bandit agent. The simulation results demonstrate that our proposed route selection approach provides a path up to similar to 33% better fidelity (less noise) compared to conventional, distance-based route selection approach for the considered quantum network.