Client-side video players employ bitrate adaptation algo- rithms to cater to the ever-growing QoE requirements of users. These ABR algorithms must balance multiple QoE factors, such as maximizing video bitrate and minimizing rebuffering times. Despite the abundance of recently pro- posed ABR algorithms, state-of-the-art schemes suffer from two practical challenges: (1) throughput prediction is dif- ficult and inaccurate predictions can lead to degraded per- formance; (2) existing algorithms use fixed heuristics which have been fine-tuned according to strict assumptions about deployment environments—such tuning precludes general- ization across network conditions and QoE objectives. To overcome these challenges, we develop Pensieve, a system that generates ABR algorithms entirely using Rein- forcement Learning (RL). Pensieve uses RL to train a neural network model that selects bitrates for future video chunks based on observations collected by client video players. Un- like existing approaches, Pensieve does not rely upon pre- programmed models or assumptions about the environment. Instead, it learns to make ABR decisions solely through ob- servations of the resulting performance of past decisions. As a result, Pensieve can automatically learn ABR algorithms that adapt to a wide range of environmental conditions and QoE metrics. We compare Pensieve to state-of-the-art ABR algorithms using trace-driven and real world experiments spanning a wide variety of network conditions, QoE metrics, and video properties. In all considered scenarios, Pensieve outperforms the best state-of-the-art scheme, with improve- ments in average QoE of 13.1%–25.0%. Pensieve’s poli- cies generalize well, outperforming existing schemes even on networks on which it was not trained.