The traditional goal in parallel algorithm design is to have an algorithm with low (polylogarithmic) depth and work matching that of the best sequential algorithm for the same problem (work-efficient). Being work-efficient is desirable in that the parallel algorithm does not perform asymptotically more operations than the best sequential algorithm for the same problem, and so is efficient even when there is not much parallelism available. Having depth that is polylogarithmic is desirable in that it allows for ample parallelism. Work-efficient and polylogarithmic-depth algorithms have been developed for many fundamental problems in computing. Many of these algorithms, however, are not practical as they involve many sophisticated machinery and have large hidden constant factors in their complexity. On the other hand, many parallel algorithms used in practice are not work-efficient and polylogarithmic-depth. The goal of this project is to bridge this gap by developing parallel algorithms for shared-memory machines that are efficient both in theory and also in practice, as this allows for good performance across all possible inputs, scalability across a wide range of core counts, and graceful scalability to larger data sets.