This study aimed to explore the effectiveness of generative AI-based automated scoring in essay-type assessments in social studies by applying various prompt engineering techniques and evaluating their performance and usefulness. Based on Bloom’s taxonomy (remember & understand / apply & analyze / evaluate & create), three essay-items and rubrics were developed, and ten types of prompts were designed. Automated scoring results based on GPT model were quantitatively and qualitatively compared to teacher scoring results. The key findings are as follows. First, the effectiveness of prompt types varied depending on the cognitive level required by the item. In particular, for lower-order items, prompts such as “Chain of Thought” and “Multi-Zero-Shot” that encouraged step-by-step reasoning showed higher accuracy. Second, scoring performance improved when rubrics and Zero-Shot prompts were written clearly and specifically. Third, prompts with added sample answers or few-shot prompts performed worse than Zero-Shot prompts, despite the additional input cost. Finally, when tested on new data, the optimal prompts demonstrated ’almost perfect‘ agreement with teacher scoring (QWK=0.88, 0.86 for Items 1 and 2) and ’substantial’ agreement (QWK=0.63 for Item 3). In addition, qualitative analyses of disagreement cases were conducted, and the usefulness of each prompt technique was discussed in detail.