Cisco tried to use artificial intelligence to write security incident reports – and things didn’t quite go as planned

Cisco warns AI-generated incident reports are often inaccurate, inconsistent and prone to data loss due to LLM limitations
The company advises detailed, single-task prompts, fixed source documents, and strict formatting rules to improve reliability
Cross-contamination between reports remains a challenge, with researchers recommending new sessions for each new incident report to avoid errors

Any enterprise looking to use AI tools for their security reporting might want to read a new report from Cisco detailing their experiences using AI-generated incident reporting.

The company has warned those using AI to create long-form technical content should expect “significant inaccuracies, unusual conclusions and inconsistent writing styles”, mostly due to the probabilistic nature of Large Language Models (LLM).

“These models generate output by predicting the next token, typically a word or subword, in a sequence, based on model weights and training data,” says Cisco, or, as The Register puts it, “they’re basically a fancy autocomplete system that makes educated guesses.”

What works and what doesn’t

Since AI basically only predicts the next word, it creates four key problems, according to Cisco:

LLMs use different data for each new inquiry, making consistency and standardization a challenge
Even if the same data is shared, the result will always be slightly different
Each new document will have different structure and formatting, which is another standardization challenge
AI often discards valuable data, changing the outcome
This does not mean that artificial intelligence is useless for long-form technical reporting – quite the contrary. It can still save companies a lot of data, but the tool must be correctly set up and optimized.

Cisco says a good approach is to give the AI ”granular, single-task instructions focused on a specific, small part of the report”.

The company also said that AI should not be free to choose their sources for the report, but instead should have specific documents. Finally, the AI should have clear instructions regarding the formatting and style.

“A blind test of the sample report in our quality assurance process showed no noticeable drop in overall write quality,” Cisco said.

“The peer reviewer, professional editor, and management reviewer all made complimentary comments about the report while not realizing it was AI-generated. The peer reviewer commented that the incidence of typos and grammatical errors was far lower than in the average report.”

Cisco also discovered another challenge — when the AI is asked to edit multiple sample reports in a single session, content from one report’s source material becomes cross-contaminated with another, “even though the notes used to generate the first report were deleted from the project’s reference documents.”

To solve this problem, the researchers advised starting a new session and re-entering the instructions for each new incident report.