Wearable, multisensor, consumer devices that estimate sleep are now commonplace, but the algorithms used by these devices to score sleep are not open source, and the raw sensor data is rarely accessible for external use. As a result, these devices are limited in their usefulness for clinical and research applications, despite holding much promise. We used a mobile application of our own creation to collect raw acceleration data and heart rate from the Apple Watch worn by participants undergoing polysomnography, as well as during the ambulatory period preceding in lab testing. Using this data, we compared the contributions of multiple features (motion, local standard deviation in heart rate, and "clock proxy") to performance across several classifiers. Best performance was achieved using neural nets, though the differences across classifiers were generally small. For sleep-wake classification, our method scored 90% of epochs correctly, with 59.6% of true wake epochs (specificity) and 93% of true sleep epochs (sensitivity) scored correctly. Accuracy for differentiating wake, NREM sleep, and REM sleep was approximately 72% when all features were used. We generalized our results by testing the models trained on Apple Watch data using data from the Multi-ethnic Study of Atherosclerosis (MESA), and found that we were able to predict sleep with performance comparable to testing on our own dataset. This study demonstrates, for the first time, the ability to analyze raw acceleration and heart rate data from a ubiquitous wearable device with accepted, disclosed mathematical methods to improve accuracy of sleep and sleep stage prediction. Statement of Significance Use of consumer sleep trackers is widespread, but because the type of data returned from the devices is often proprietary (e.g. "Fitbit steps") and the algorithms are typically trade secret, most are not used by the clinical and research communities. We wrote our own code to directly access the accelerometer on the Apple Watch. We then recorded raw acceleration, along with heart rate data as measured via photoplethysmography in the Apple Watch, during the night while subjects underwent the gold standard for sleep tracking, polysomnography. We compared the output of multiple classification algorithms to ground truth polysomnography to determine best performance. This sets the stage for greater transparency in the use of wearables to assess sleep on a large scale.