Machine learning (ML) models can accelerate the development of efficient internal combustion engines. This study assessed the feasibility of data-driven methods toward predicting the performance of a diesel engine modified to natural gas (NG) spark ignition (SI), based on a limited number of experiments. As the best ML technique cannot be chosen a priori, the applicability of different ML algorithms for such an engine application was evaluated. Specifically, the performance of two widely used ML algorithms, the random forest (RF) and the artificial neural network (ANN), in forecasting engine responses related to in-cylinder combustion phenomena was compared. The results indicated that both algorithms with spark timing (ST), mixture equivalence ratio, and engine speed as model inputs produced acceptable results with respect to predicting engine performance, combustion phasing, and engine-out emissions. Despite requiring more effort in hyperparameter optimization, the ANN model performed better than the RF model, especially for engine emissions, as evidenced by the larger R-squared, smaller root-mean-square errors (RMSEs), and more realistic predictions of the effects of key engine control variables on the engine performance. However, in applications where the combustion behavior knowledge is limited, it is recommended to use a RF model to quickly determine the appropriate number of model inputs. Consequently, using the RF model to define the model structure and then using the ANN model to improve the model’s predictive capability can help to rapidly build data-driven engine combustion models.