Introducing HWIT: Using Prompt Engineering to Stimulate Creativity in Human-Machine Teams
The question of how best to use artificial intelligence (AI) resources in the military, including for writing, is an important one. AI marketing materials (and other rhetorical efforts) frame AI and large language models (LLMs) as capable of enhancing human creativity or capable of being creative in their own right. Some of this hype stems from the desire to sell these products to writers—including those in the military. As a practical matter, however, AI’s creative potential remains a subject of debate. For example, most of the ideas generated by AI fall into the category of conventional, while AI’s ability to achieve surprise or novelty is limited. Additionally, while some research finds that AI improves individual performance and creativity, it also stifles the creativity of larger groups. For my part, while I find value in employing AI in the classroom, I too am skeptical about its inherent “creativity.”
If AI is not actually all that creative, it poses a real problem for military writing, especially as the U.S. military progressively turns to AI to maintain its competitive warfighting advantage over our adversaries. This is because working with LLMs is often associated with a process of cognitive offloading, whereby human users shift the burden of creative and critical thought to AI. Yet, not only do creative and critical thinking usually require higher-order thinking, the arts, to include that of military planning, involve fundamentally creative and critical processes. It is one thing to offload tasks that can and should be automated, but it would be an altogether different thing to shift creative tasks requiring higher-order thought to an AI. For instance, using AI to summarize meeting notes or format briefing slides can be as invaluable to staff officers drowning in paperwork as it can be for students to employ AI to check for spelling, grammar, and formatting mistakes.
But what about using AI to draft orders, to craft options for complex military problems, or even to write in an academic setting? The risk, when shifting the burden of creative expression to an AI, is in regression to the mean. LLMs are built on powerful statistical models and massive datasets that treat frequency as synonymous with importance. Thus, their output is generally going to reflect the underlying averages found in their original training data. In fact, this is the origin, at least in part, of both AI hallucination and the proclivity for AI-generated writing to overuse certain words and punctuation. An AI that anchors on a term is likely to produce output correlated with related terms in the data regardless of either the relevance of those associations to a given prompt or the veracity of any ostensible truth claims found in its output.
While LLM content is said to be non-deterministic, which is to say that the same prompts often produce different content, LLMs are, paradoxically, like predictive text messaging on steroids. This is more than a mere metaphor as AI-generated output is often predictable. Anyone who has used the same prompt three, six, or twelve times just to end up with three, six, or twelve “different” variations on the same output can surely attest to this, but more important are the implications for warfighting, where randomness is essential to strategic planning. If our adversaries can approximate the input of a planner using an LLM trained on the same or similar data, they can produce similar output, which in turn means that they can anticipate the plans generated by anyone over-relying on AI just as surely as they can calculate the same sums using a calculator.
Despite this, under the right conditions, human-machine teams, or “centaurs,” are often more capable than either humans or AIs working alone. It is plausible that this stems from human creativity itself. The imagination, and in particular, the human ability to adapt to uncertainty by thinking of a wider range of alternative futures over longer time periods than AI, make people superior at strategic oriented tasks, while the rapidity and accuracy of AI generally make it more effective at the tactical level. In other words, if wielded in the right ways, LLMs have real potential to become powerful creative partners. And this is to say nothing of the role played by human judgment in the moral and ethical use of AI for military ends.
This raises the question of how one is to achieve the benefits of human-machine teaming without succumbing to the brain’s natural predisposition for cognitive offloading when working with an LLM. Deliberate effort is required, but good AI prompt engineering—the process of writing input in a way that helps an LLM better comprehend a user’s intent and meaning—can facilitate this effort. In fact, it is important enough to productive interaction with LLMs that the most effective prompt engineers will already be engaged in higher-order thinking. The problem, however, with extant prompt engineering techniques (like zero-shot prompting, few-shot prompting, or others) is that they do not necessarily include a forcing function requiring creative human input from their users.
In other words, the problem with over-relying on extant prompt engineering techniques to complete a creative project is that some users will task an LLM without providing adequate creative input, leaving the AI to generate the results entirely on its own. Since much of the discourse on AI currently centers around the perils and promise it poses for writing and education, think of a high school or college student assigned to write a paper. An effective prompt would give an LLM insight into the student’s own brainstorming and original ideas. However, crafting a zero-shot prompt that only includes their paper instructions without any additional input will produce an unoriginal product. Students who fall prey to the trap of cognitive offloading will then submit the resulting output as if it is “good enough.” This is the kind of AI use that has educators raising warning flags.
This gap results from the fact that most current tools boil down to the same baseline task: “Here’s a request, now generate a response.” To redress this gap, I have developed a new kind of prompt (see the figure below) called a HWIT prompt—think Stewie Griffin pronouncing the h in whip. HWIT, for “Here’s What I Think,” can serve as a standalone, specialized zero-shot prompt, or as an inject into other existing prompt formats. Either way, it requires active human-in-the-loop participation in the crafting of AI-generated output by expanding the baseline request to “Here’s a request, here’s my idea, now generate a response.” As a method of prompt engineering, I believe HWIT will enable the blossoming of effective creative partnerships when used correctly.
|
Task: [Describe the LLM’s ultimate output.] Context: [Provide relevant background, including audience, stakes, tone, or other constraints.] HWIT: Here’s what I think: [Provide starting ideas, best guesses, creative angles, tentative arguments, or other interesting / weird details.] Restate my ideas in your own words so I know you understand them, and ask [#] questions that you think would help us improve my ideas. Other Instructions (Optional):
Output: [Specify guidance for output (e.g., bullets, outlines, essays, etc.).] DO NOT produce [specify output] until I have answered all of your questions. |
Figure. HWIT Prompt Template
So how does it work? As exemplified by the figure above, HWIT, like any other prompt, includes the specification of a concrete task—in this case, the generation of creative output. With HWIT, users also specify context. This can include things like audience or tone, but it can also include project details (e.g., “I’m developing a COA analysis,” “I’m comparing operational approaches in a wargame,” “I’m writing an information paper for the commander”). Finally, users specify their own ideas and their preferred output, along with any optional instructions they wish to provide with respect to things like critical feedback or persona. To use the template, the bracketed text just needs to be replaced with text specific to your creative tasks.
In essence, HWIT works similarly to retrieval augmented generation by intentionally cueing the LLM to incorporate a user’s thinking as a source of external knowledge; and similarly to prompt chaining by deliberately inducing conversational iteration through a question-and-answer session wherein the user—not the LLM—is required to build on their own ideas. In this way, AI does not supplant the human responsibility for critical and creative thought. It instead serves to enhance our writing and thinking by acting as the kind of creative partner and sounding board so many AI advocates wish for it to be.
Of course, risks remain. The utopian impulse to treat AI as a panacea must still be tempered. First, HWIT is not for simple or menial tasks. Most professionals will find this tool to be overkill for writing computer code, doing math, or even writing simple emails. The good news, however, is that using this prompt is entirely voluntary. No one need incorporate it into their workflow unless they are aiming to produce a creative work. HWIT will not help a user challenge their own biases unless purposely prompted, nor will it solve the problem of bias in an LLM’s underlying training data.
Another issue is that this technique will not necessarily address known problems with LLMs like hallucination, sycophancy, or AI psychosis. HWIT might help reduce some instances of hallucination by giving the model specific ideas on which to anchor its contributions. Sycophancy, like the overuse of certain phrases, results in part from reinforcement learning from human feedback. However, clear guidance about persona may be the best way to prevent an LLM from fawning unhelpfully over one’s ideas. I recommend advising the model to serve as a critical but fair creative partner responsible for producing constructive feedback. At the same time, effective prompt engineering for creativity’s sake will always require cognitive effort. A vacuous HWIT or an “I don’t know” will be incapable of forcing human participation unless the user leverages their question-and-answer session to promote brainstorming. This may work for staff officers pressed for time, or busy students, but lazy users will be unlikely to produce meaningful results, leaving them susceptible to cognitive offloading, and with it, the dangers of adversary sensemaking efforts, or other perils (like plagiarism or cheating).
Other solutions, like conversational prompting, might be able to produce similar results without the formal prompt, but this requires active participation in a dialogue, increasing the risk of cognitive offloading if not structured properly. Better still would be to combine the power of HWIT with conversational prompting, thus encouraging additional iteration, or other techniques like tree-of-thought prompting, which may help facilitate long term planning and COA development. Alternatively, agents might be developed using rulesets that require human participation, but this likely requires more familiarity with LLMs than average users currently possess. From an educational perspective, just not using AI at all will remain a viable option, as well. Many of those who refuse to integrate AI into their teaching will undoubtedly remain unmoved by HWIT. However, these educators are doing a different, but very real service. In the context of a liberal education, students must learn how to think both with and without AI. Decisions about incorporating AI should be driven by and connected to the learning objectives one is trying to achieve and not by new technological trends (be they fads or tools with real longevity).
On the other hand, AI seems to be here whether we like it or not. There is a real danger that those who altogether refuse to integrate AI into their professional practice may risk being left behind in their respective industries. But there is a middle ground between the dystopianism of the AI skeptic and the utopianism of the AI disciple. That middle ground requires users to understand how to use AI effectively without using it as a crutch. Writing is thinking, and effective prompt engineering, like that embodied in the HWIT technique, is writing. If HWIT is thinking, as this logic would imply, it may very well present one of the smoothest possible paths to that middle ground.
Dr. Luke M. Herrington is an assistant professor of social science at the School of Advanced Military Studies at Fort Leavenworth, Kansas. His adventures teaching with AI are chronicled in the September 2025 edition of Journal of Military Learning and the March-April 2026 edition of Military Review. Herrington’s work also appears in Joint Force Quarterly, E-International Relations, and The Diplomat.
Disclaimers/Acknowledgements:
Opinions, conclusions, and recommendations expressed or implied are solely those of the author and do not necessarily represent the School of Advanced Military Studies, the Command and General Staff College, the Department of War, or any other U.S. government agency; references to specific platforms and software are not intended as either endorsements or promotion.
This is an original essay written wholly by the author, but during the brainstorming process, the HWIT method was tested to produce the prompt template and an example blog using ChatGPT and Gemini, respectively. The author also acknowledges Jacob A. Mauslein, Dan G. Cox, and Jeannie Herrington for their help refining the final product. Some of the ideas in this essay were also shaped by Kenneth Payne’s “Artificial Intelligence in National Security” MOOC.