SMS: OFFER: Off-environment reinforcement learning