#!/usr/bin/env python # coding: utf-8 # # Overview # This project focuses on developing a reinforcement learning-based algorithmic trading strategy with the goal of creating a trading agent that learns optimal trading strategies by interacting with historical market data and making buy/sell decisions based on current market conditions. The reinforcement learning-based trading strategy learns to make trading decisions based on historical market data. The Q-learning algorithm is used to learn the optimal actions (buy, hold, or sell) for different states of the market. The agent is trained through iterative episodes, where it explores and exploits different actions and receives rewards based on the profitability of its decisions. # # The performance of the reinforcement learning strategy is compared to a baseline buy-and-hold strategy. The results show that the reinforcement learning strategy outperforms the buy-and-hold strategy in terms of cumulative returns and Sharpe ratio. However, it also experiences a higher maximum drawdown, indicating potential risks. Overall, the project demonstrates the application of reinforcement learning techniques in developing algorithmic trading strategies. It highlights the potential of using machine learning to learn optimal trading strategies from historical data. However, it also acknowledges the complexities and computational requirements associated with reinforcement learning algorithms. # # # # # Part 1 # Collect historical market data for a set of assets (e.g., stocks, cryptocurrencies). Preprocess the data to remove outliers, handle missing values, and format it into a suitable input for the reinforcement learning model. # In[1]: import yfinance as yf import numpy as np import matplotlib.pyplot as plt import warnings warnings.filterwarnings('ignore') import pandas as pd # List of DJIA ticker symbols djia_tickers = [ 'AMZN', 'AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WMT', 'DIS', 'DOW' ] # Create an empty DataFrame to store closing prices closing_prices = pd.DataFrame() # Fetch closing prices for each ticker symbol for ticker in djia_tickers: ticker_data = yf.Ticker(ticker) ticker_df = ticker_data.history(period='365d') closing_prices[ticker] = ticker_df['Close'] # Drop rows with missing data closing_prices = closing_prices.dropna() closing_prices.head() # # Part 2 # Implement a reinforcement learning algorithm (e.g., Q-learning, Deep Q-Networks) to learn trading strategies. Design the state space, action space, and reward structure for the trading agent. Train the reinforcement learning model using the preprocessed historical data. # In[2]: # Define the state space states = closing_prices.pct_change().dropna().values # Define the action space actions = ['buy', 'hold', 'sell'] # Define the reward structure def calculate_reward(action, price_change): if action == 'buy': return price_change elif action == 'sell': return -price_change else: return 0 # Initialize the Q-table q_table = np.zeros((len(states), len(actions))) # Set hyperparameters alpha = 0.1 # Learning rate gamma = 0.9 # Discount factor epsilon = 0.1 # Exploration rate # Training loop num_episodes = 1000 for episode in range(num_episodes): state = 0 # Start from the first time step done = False while not done: # Choose an action using epsilon-greedy policy if np.random.uniform(0, 1) < epsilon: action = np.random.choice(actions) else: action = actions[np.argmax(q_table[state])] # Take the action and observe the next state and reward stock_index = np.random.randint(0, states.shape[1]) # Randomly select a stock price_change = states[state, stock_index] reward = calculate_reward(action, price_change) # Update the Q-value next_state = state + 1 # Move to the next time step q_table[state, actions.index(action)] += alpha * ( reward + gamma * np.max(q_table[next_state]) - q_table[state, actions.index(action)] ) # Move to the next state state = next_state # Check if the episode is done if state == len(states) - 1: done = True # Print the learned Q-table print(q_table) # The script uses Q-learning to learn the optimal trading strategy. It initializes a Q-table and iteratively updates it based on the observed states, actions, and rewards. The agent explores the environment using an epsilon-greedy policy, gradually learning the optimal action for each state. # # Part 3.1 # Evaluate the performance of the trained trading agent using a separate testing dataset. Implement backtesting to assess the profitability and risk-adjusted returns of the trading strategy. # In[3]: # Split the data into training and testing sets train_data = closing_prices.iloc[:int(len(closing_prices) * 0.8)] test_data = closing_prices.iloc[int(len(closing_prices) * 0.8):] # Define the state space for training and testing train_states = train_data.pct_change().dropna().values test_states = test_data.pct_change().dropna().values # Train the Q-learning agent using the training data # Initialize the Q-table for training q_table_train = np.zeros((len(train_states), len(actions))) # Set hyperparameters alpha = 0.1 # Learning rate gamma = 0.9 # Discount factor epsilon = 0.1 # Exploration rate # Training loop num_episodes = 1000 for episode in range(num_episodes): state = 0 # Start from the first time step done = False while not done: # Choose an action using epsilon-greedy policy if np.random.uniform(0, 1) < epsilon: action = np.random.choice(actions) else: action = actions[np.argmax(q_table_train[state])] # Take the action and observe the next state and reward stock_index = np.random.randint(0, train_states.shape[1]) # Randomly select a stock price_change = train_states[state, stock_index] reward = calculate_reward(action, price_change) # Update the Q-value next_state = state + 1 # Move to the next time step q_table_train[state, actions.index(action)] += alpha * ( reward + gamma * np.max(q_table_train[next_state]) - q_table_train[state, actions.index(action)] ) # Move to the next state state = next_state # Check if the episode is done if state == len(train_states) - 1: done = True # Evaluate the trained agent on the testing data state = 0 done = False total_reward = 0 while not done: # Choose the action with the highest Q-value action = actions[np.argmax(q_table[state])] # Take the action and observe the reward stock_index = np.random.randint(0, test_states.shape[1]) price_change = test_states[state, stock_index] reward = calculate_reward(action, price_change) # Accumulate the rewards total_reward += reward # Move to the next state state += 1 # Check if the episode is done if state == len(test_states) - 1: done = True # Print the total reward obtained on the testing data print("Total reward on testing data:", total_reward) # In[4]: # Create a DataFrame to store the backtesting results backtest_results = pd.DataFrame(columns=['Action', 'Price_Change', 'Reward', 'Portfolio_Value']) # Set the initial portfolio value initial_portfolio_value = 10000 portfolio_value = initial_portfolio_value # Set the transaction cost (e.g., commission, slippage) transaction_cost = 0.001 # Perform backtesting on the test data for state in range(len(test_states)): # Choose the action with the highest Q-value action = actions[np.argmax(q_table_train[state])] # Take the action and observe the price change and reward stock_index = np.random.randint(0, test_states.shape[1]) price_change = test_states[state, stock_index] reward = calculate_reward(action, price_change) # Calculate the new portfolio value based on the action and price change if action == 'buy': portfolio_value *= (1 + price_change - transaction_cost) elif action == 'sell': portfolio_value *= (1 - price_change - transaction_cost) # Store the backtesting results new_result = pd.DataFrame({ 'Action': [action], 'Price_Change': [price_change], 'Reward': [reward], 'Portfolio_Value': [portfolio_value] }) backtest_results = pd.concat([backtest_results, new_result], ignore_index=True) # Calculate the total return total_return = (portfolio_value - initial_portfolio_value) / initial_portfolio_value # Calculate the Sharpe ratio daily_returns = backtest_results['Portfolio_Value'].pct_change() sharpe_ratio = np.sqrt(252) * daily_returns.mean() / daily_returns.std() # Print the backtesting results and performance metrics print("Backtesting Results:") print(backtest_results) print("\nPerformance Metrics:") print("Total Return:", total_return) print("Sharpe Ratio:", sharpe_ratio) # Visualize the backtesting results plt.figure(figsize=(10, 6)) plt.plot(backtest_results['Portfolio_Value']) plt.title('Portfolio Value Over Time') plt.xlabel('Time Step') plt.ylabel('Portfolio Value') plt.show() # # Part 3.2 # Compare the performance of the reinforcement learning-based strategy with baseline strategies (e.g., buy-and-hold). # In[5]: # Create DataFrames to store the backtesting results rl_backtest_results = pd.DataFrame(columns=['Action', 'Price_Change', 'Reward', 'Portfolio_Value']) bh_backtest_results = pd.DataFrame(columns=['Action', 'Price_Change', 'Portfolio_Value']) # Set the initial portfolio values rl_portfolio_value = initial_portfolio_value bh_portfolio_value = initial_portfolio_value # Set the transaction cost (e.g., commission, slippage) transaction_cost = 0.001 # Perform backtesting for the reinforcement learning strategy for state in range(len(test_states)): # Choose the action with the highest Q-value action = actions[np.argmax(q_table_train[state])] # Take the action and observe the price change and reward stock_index = np.random.randint(0, test_states.shape[1]) price_change = test_states[state, stock_index] reward = calculate_reward(action, price_change) # Calculate the new portfolio value based on the action and price change if action == 'buy': rl_portfolio_value *= (1 + price_change - transaction_cost) elif action == 'sell': rl_portfolio_value *= (1 - price_change - transaction_cost) # Store the backtesting results for the reinforcement learning strategy new_rl_result = pd.DataFrame({ 'Action': [action], 'Price_Change': [price_change], 'Reward': [reward], 'Portfolio_Value': [rl_portfolio_value] }) rl_backtest_results = pd.concat([rl_backtest_results, new_rl_result], ignore_index=True) # Perform backtesting for the buy-and-hold strategy for state in range(len(test_states)): # Assume the buy-and-hold strategy always holds the stock action = 'hold' # Observe the price change stock_index = np.random.randint(0, test_states.shape[1]) price_change = test_states[state, stock_index] # Calculate the new portfolio value based on the price change bh_portfolio_value *= (1 + price_change) # Store the backtesting results for the buy-and-hold strategy new_bh_result = pd.DataFrame({ 'Action': [action], 'Price_Change': [price_change], 'Portfolio_Value': [bh_portfolio_value] }) bh_backtest_results = pd.concat([bh_backtest_results, new_bh_result], ignore_index=True) # Calculate the total returns rl_total_return = (rl_portfolio_value - initial_portfolio_value) / initial_portfolio_value bh_total_return = (bh_portfolio_value - initial_portfolio_value) / initial_portfolio_value # Calculate the Sharpe ratios rl_daily_returns = rl_backtest_results['Portfolio_Value'].pct_change() rl_sharpe_ratio = np.sqrt(252) * rl_daily_returns.mean() / rl_daily_returns.std() bh_daily_returns = bh_backtest_results['Portfolio_Value'].pct_change() bh_sharpe_ratio = np.sqrt(252) * bh_daily_returns.mean() / bh_daily_returns.std() # Print the performance metrics for both strategies print("Reinforcement Learning Strategy:") print("Total Return:", rl_total_return) print("Sharpe Ratio:", rl_sharpe_ratio) print("\nBuy-and-Hold Strategy:") print("Total Return:", bh_total_return) print("Sharpe Ratio:", bh_sharpe_ratio) # By comparing the total returns and Sharpe ratios, we can assess the relative performance of the reinforcement learning strategy against the buy-and-hold baseline. If the reinforcement learning strategy has a higher total return and Sharpe ratio, it suggests that it outperforms the buy-and-hold strategy on both absolute and risk-adjusted bases. # # # Part 4.1 # Visualize the trading decisions made by the agent over time. Analyze the cumulative returns, Sharpe ratio, maximum drawdown, and other relevant metrics. # In[6]: # Visualize the trading decisions and portfolio values over time plt.figure(figsize=(12, 8)) plt.subplot(2, 1, 1) plt.plot(rl_backtest_results.index, rl_backtest_results['Portfolio_Value'], label='Reinforcement Learning') plt.plot(bh_backtest_results.index, bh_backtest_results['Portfolio_Value'], label='Buy-and-Hold') plt.title('Portfolio Value Over Time') plt.xlabel('Time') plt.ylabel('Portfolio Value') plt.legend() plt.subplot(2, 1, 2) plt.plot(rl_backtest_results.index, rl_backtest_results['Action'].map({'buy': 1, 'hold': 0, 'sell': -1}), marker='o', linestyle='None', label='Reinforcement Learning') plt.title('Trading Decisions Over Time') plt.xlabel('Time') plt.ylabel('Action') plt.legend() plt.tight_layout() plt.show() # Calculate additional performance metrics rl_cumulative_returns = (rl_backtest_results['Portfolio_Value'].iloc[-1] - rl_backtest_results['Portfolio_Value'].iloc[0]) / rl_backtest_results['Portfolio_Value'].iloc[0] bh_cumulative_returns = (bh_backtest_results['Portfolio_Value'].iloc[-1] - bh_backtest_results['Portfolio_Value'].iloc[0]) / bh_backtest_results['Portfolio_Value'].iloc[0] rl_max_drawdown = (rl_backtest_results['Portfolio_Value'] / rl_backtest_results['Portfolio_Value'].cummax() - 1).min() bh_max_drawdown = (bh_backtest_results['Portfolio_Value'] / bh_backtest_results['Portfolio_Value'].cummax() - 1).min() # Print additional performance metrics print("Reinforcement Learning Strategy:") print("Cumulative Returns:", rl_cumulative_returns) print("Maximum Drawdown:", rl_max_drawdown) print("\nBuy-and-Hold Strategy:") print("Cumulative Returns:", bh_cumulative_returns) print("Maximum Drawdown:", bh_max_drawdown) # By visualizing the trading decisions and portfolio values over time, we can gain insights into how the reinforcement learning strategy behaves compared to the buy-and-hold strategy. The plots will show the timing and impact of the trading decisions on the portfolio value. # # Part 4.2 # Provide insights into the strengths and weaknesses of the developed trading strategy # # # Strengths of the Reinforcement Learning Strategy: # - Higher Cumulative Returns: The reinforcement learning strategy has generated cumulative returns of that outperformed the buy-and-hold strategy, indicating that the reinforcement learning strategy has been more profitable overall during the backtesting period. # - Adaptability: Reinforcement learning algorithms have the ability to learn and adapt to changing market conditions. T # # Weaknesses of the Reinforcement Learning Strategy: # - Maximum Drawdown: While the reinforcement learning strategy has a lower drawdown, it still indicates potential risks and the need for effective risk management techniques. # - Complexity and Computational Requirements: Reinforcement learning algorithms can be complex and computationally intensive. # # # Strengths of the Buy-and-Hold Strategy: # - Simplicity: The buy-and-hold strategy remains straightforward and easy to implement. It does not require active trading decisions or complex algorithms. # - Lower Maximum Drawdown: The buy-and-hold strategy has experienced a slightly higher maximum drawdown, however, the difference is relatively small, and both strategies have faced significant drawdowns. # # Weaknesses of the Buy-and-Hold Strategy: # - Lower Cumulative Returns: The buy-and-hold strategy has generated lower cumulative returns suggesting that the passive approach of holding the asset has been less profitable during the backtesting period. # - Lack of Adaptability: The buy-and-hold strategy does not actively adapt to changing market conditions. It relies on the long-term growth of the asset and may miss out on short-term trading opportunities. #