The Colombian Ministry of Information and Communication Technologies (MinTIC) initiated the Data Science for All - DS4A program to improve the data analysis capabilities of Colombians. The program's goal was to teach data science and machine learning techniques to participants.
The DS4A program was an effective platform for improving Colombian citizens' data analysis skills and bridging the gap between academia and industry. Participants were able to gain practical experience and develop solutions to business challenges by working on actual projects. In order to complete the program, students were required to collaborate with a company to develop a project that addressed a real-world problem using data science and machine learning.
Our team collaborated with an electric company that focuses on infrastructure projects. During our work together, the company emphasized the importance of conducting an economic analysis to evaluate the feasibility of self-generation and distributed generation projects. To accurately assess viability, it is essential to have knowledge of the energy price offered in the electricity system. Predicting the supply energy price would improve the financial assessment of such projects and make renewable energy more appealing to potential investors.
The energy price in the market depends on various electricity system variables such as generating capacity, weather, oil prices, demand, technical limitations, and others. Hence, the energy price is a highly volatile commodity in the world markets. However, energy market prices, which reflect the energy price volatility and are publicly accessible, were used to predict the energy price.
The data used for forecasting included energy generation data, vendor and pricing information, and weather-related reservoir data, as hydroelectric power plants are the main energy source in Colombia. XM's Portal Bi provided this data set as the sole operator of the electric power grid.
Our solution consisted of a web application that utilized machine learning models to predict the Kilowatt/hour price in the Colombian Energy Market.I was in charge of the development and deployment of the web application, I built it using Python and Dash.
The app consists of two modules. The first module is a descriptive section that explores the datasets in-depth to identify relevant information. Different types of graphs of the time series dataset were used to analyze behavior, trends, seasonality, patterns, and interesting relations between variables. This module aimed to provide a better understanding of the data set variables available, enabling customized graphics to summarize their main characteristics.
The second module covers the forecasting of the Kilowatt/hour price. An interactive graph was used to present the results of the Machine Learning models, which generated the energy price forecast using historical price information and explanatory variables such as Hydraulic Availability, Thermal Availability, Flow contribution, Daily Volume (Mm3), Volume (Mm3), and Daily Useful Volume (gWh).
While developing the application we tested over 150 different models and selected ARIMA, SARIMAX, and Neural Prophet as the best performing models to project all the required series for this exercise. Given that each model assigns different values to the incorporated information, the results they deliver also differ. Therefore, an additional projection was added, which was the average of the former projections. This projection reduced the furthest or most extreme values of each model with respect to the others.
Finally, recognizing the difficulty that time series models have in anticipating unexpected events that affect the price, we estimated and incorporated the corresponding impulse response functions from the SARIMAX model to simulate different types of shocks on their projections and apply them over all the projections.