An MRP formulation for supervised learning: generalized temporal difference learning models
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MR...
প্রধান লেখক: | , , , |
---|---|
বিন্যাস: | Conference item |
ভাষা: | English |
প্রকাশিত: |
OpenReview
2024
|