- Increased max_token_length from 16000 to 32000 to allow for longer inputs. - Adjusted agent_temperature from 0.6 to 0.8 for more varied responses. - Extended test_timeout from 180 to 600 seconds to accommodate longer evaluations. - Updated data directory path for saving evaluations to ensure proper organization.