Let's start with the idea that the mental states we like are the ones that are somehow "internally consistent" or non-conflicting. Mental states that contain disharmonious representations, on the other hand, cause us mental pain. Maybe this is connected to energy efficiency of representation in the brain, but that's a tangent.
We like art because it promotes self-consistent sets of representations. Take music as an example. In the simplest case, just listening to something like a single tone occupies part of your attention. But lots of other parts of your mind aren't engaged in processing that tone. So the tone isn't extraordinarily pleasing. But if you add some variation, then more processing is required. The specific types of variations that will engage more processing depend on the listener, although there are lots of commonalities across humans.
For music to be profoundly awesome, it has to engage huge amounts of your mind. Most of the mind doesn't deal with low-level sound patterns (e.g. the specific mathematical characteristics of a waveform); it deals with human things, like people and stories and feelings and ideas. That's why great songs have complexity on many levels - including sounds that evoke higher-order sensory representations (like the way a sound of glass breaking creates a certain kind of imagery), and even things like lyrics.
To me, this is connected to things like syncopation. For example, I love the title track on Toxicity because every rhythmic pattern is punctuated by another pattern. Of course this is listener-specific again. Some people might be awe-struck by Bach's Goldberg Variations or Nancarrow's player piano pieces, but other people just perceive it as a homogenic wash of notes. For the latter person, the pattern of the music isn't being articulated on the lower level, so those parts aren't available for synthesis/representation on a higher level.
Another thing is that the brain is good at collapsing things that have any repetition. So a good piece of music can't continue to engage a lot of your mind with a single set of patterns. They have to change. For example, in movies with a surprise ending, your brain has "figured out" the pattern of the first part of the movie, and then the end part forces new representation relative to the "figured out" state of representation.
Any given "part" of your mind can only represent "one thing" at once. So when a piece of art coordinates many parts of your mind to represent it, much of your mind is brought into harmonious representation.