A key functionality that brings an application from speech-to-text to a complete dictation solution is Commands. Put your users in the driver seat to control their workflow by defining commands to insert templates, navigate the application, automate repetitive tasks, and more! Commands are set in dictation configuration when directly calling the /transcribe API or when using the Dictation Web Component. Below are example commands, showing both the configuration and server response.
See the full API specification here for more information on how to define commands in your configuration.

API Definitions

Explanation of arguments used in /transcribe command configuration:
ArgumentTypeDefinition
idstringIdentifier for the command when it is detected by the speech recognizer.
phrasesstring arrayWord sequence that is spoken to trigger the command.
variablesstring arrayPlaceholder in phrases to define multiple words that should trigger the command. The word options are defined as a enum list within the variables object.

Example commands

When configured commands are recognized by the server, the command object will be returned over the web socket (as opposed to the transcript object for speech recognized text), which includes the command ID and variables that the integrating application uses to execute the defined action.
Note that the id, phrases, and variables shown below are mere examples. Define these values as needed for your own commands. Additionally, your application must manage the actions to be triggered by recognized commands.
Command to navigate around your application (e.g., to a section within the current document template). The command includes a defined list of words that can be recognized for the section_key variable.
commands: [
    {
      id: "go_to_section",
      phrases: ["go to {section_key} section"],
      variables: [
        {
          key: "section_key",
          type: "enum",
          enum: ["subjective", "objective", "assessment", "plan", "next", "previous"]
        }
      ]
    }
]

Select text

Command to select text in the editor. The command includes a defined list of words that can be recognized for the select_range variable. Your application can define different delete actions for each of the options, or add more complex handling for selecting specific words mentioned in the command utterance.
commands: [
    {
      id: "select_range",
      phrases: ["select {select_range}"],
      variables: [
        {
          key: "select_range",
          type: "enum",
          enum: ["all", "the last word", "the last sentence"]
        }
      ]
    }
  ]

Delete text

Command to delete text. The command includes a defined list of words that can be recognized for the delete_range variable. Your application can define different delete actions for each of the options!
commands: [
    {
      id: "delete_range",
      phrases: ["delete {delete_range}"],
      variables: [
        {
          key: "delete_range",
          type: "enum",
          enum: ["everything", "the last word", "the last sentence", "that"]
        }
      ]
    }
]

Putting it all together

The three commands defined above can be combined into one dictation configuration with accompanying Javascript to execute the actions. The below example includes navigation commands to move between sections within the SOAP note template, delete text command to remove last word, sentence, or paragraph, and select text commands to select text within the active text field:
Full Dictation Command Configuration
dictation.dictationConfig = {
  primaryLanguage: "en",
  interimResults: true,
  spokenPunctuation: true,
  automaticPunctuation: false,
  commands: [
    {
      id: "go_to_section",
      phrases: ["go to {section_key} section"],
      variables: [
        {
          key: "section_key",
          type: "enum",
          enum: ["subjective", "objective", "assessment", "plan", "next", "previous"]
        }
      ]
    },
    {
      id: "delete_range",
      phrases: ["delete {delete_range}"],
      variables: [
        {
          key: "delete_range",
          type: "enum",
          enum: ["everything", "the last word", "the last sentence", "that"]
        }
      ]
    },
    {
      id: "select_range",
      phrases: ["select {select_range}"],
      variables: [
        {
          key: "select_range",
          type: "enum",
          enum: ["all", "the last word", "the last sentence"]
        }
      ]
    }
  ]
};

dictation.addEventListener("transcript", (e) => {
  const { data } = e.detail;
  const currentTextarea = textareas[activeIndex];
  const interimTranscriptEl = document.getElementById("interimTranscript");

  if (data.isFinal) {
    // Instead of appending at the end, insert the text at the user's cursor
    // or replace the selected range.
    const start = currentTextarea.selectionStart;
    const end = currentTextarea.selectionEnd;
    const insertedText = data.text + " ";
    const before = currentTextarea.value.slice(0, start);
    const after = currentTextarea.value.slice(end);
    currentTextarea.value = before + insertedText + after;
    const newCursorPos = start + insertedText.length;
    currentTextarea.setSelectionRange(newCursorPos, newCursorPos);
    
    interimTranscriptEl.classList.add("hidden");
  } else {
    interimTranscriptEl.classList.remove("hidden");
    interimTranscriptEl.innerText = data.rawTranscriptText;
  }
});

dictation.addEventListener("command", (e) => {
  const { data } = e.detail;
  document.getElementById("interimTranscript").classList.add("hidden");
  const commandOutput = document.getElementById("commandOutput");

  if (data.id === "go_to_section") {
    const section = data.variables.section_key.toLowerCase();
    if (section === "next") {
      if (activeIndex < textareas.length - 1) {
        activeIndex++;
        textareas[activeIndex].focus();
      }
    } else if (section === "previous") {
      if (activeIndex > 0) {
        activeIndex--;
        textareas[activeIndex].focus();
      }
    } else {
      const index = textareas.findIndex(el => el.id.toLowerCase() === section);
      if (index !== -1) {
        activeIndex = index;
        textareas[activeIndex].focus();
      }
    }
    commandOutput.innerHTML = "Command: " + data.id + " with section key: " + section;

  } else if (data.id === "delete_range") {
    const range = data.variables.delete_range.toLowerCase();
    const currentTextarea = textareas[activeIndex];
    let content = currentTextarea.value;
    
    if (range === "everything") {
      currentTextarea.value = "";
    } else if (range === "the last word") {
      let words = content.trim().split(/\s+/);
      words.pop();
      currentTextarea.value = words.join(" ") + (words.length > 0 ? " " : "");
    } else if (range === "the last sentence") {
      let sentences = content.match(/[^.!?]+[.!?]*\s*/g);
      if (sentences && sentences.length > 0) {
        sentences.pop();
        currentTextarea.value = sentences.join("");
      }
    } else if (range === "that") {
      // If there's a selection, delete it; otherwise, delete the last word.
      const start = currentTextarea.selectionStart;
      const end = currentTextarea.selectionEnd;
      if (start !== end) {
        currentTextarea.value = content.slice(0, start) + content.slice(end);
      } else {
        let words = content.trim().split(/\s+/);
        words.pop();
        currentTextarea.value = words.join(" ") + (words.length > 0 ? " " : "");
      }
    }
    commandOutput.innerHTML = "Command: " + data.id + " with delete range: " + range;

  } else if (data.id === "select_range") {
    const range = data.variables.select_range.toLowerCase();
    const currentTextarea = textareas[activeIndex];
    const content = currentTextarea.value;
    
    if (range === "all") {
      currentTextarea.focus();
      currentTextarea.setSelectionRange(0, content.length);
    } else if (range === "the last word") {
      const trimmedContent = content.trimEnd();
      const lastSpaceIndex = trimmedContent.lastIndexOf(" ");
      const start = lastSpaceIndex !== -1 ? lastSpaceIndex + 1 : 0;
      currentTextarea.focus();
      currentTextarea.setSelectionRange(start, trimmedContent.length);
    } else if (range === "the last sentence") {
      const sentences = content.match(/[^.!?]+[.!?]*\s*/g);
      if (sentences && sentences.length > 0) {
        let sum = 0;
        for (let i = 0; i < sentences.length - 1; i++) {
          sum += sentences[i].length;
        }
        const lastSentence = sentences[sentences.length - 1];
        const start = sum;
        const end = start + lastSentence.length;
        currentTextarea.focus();
        currentTextarea.setSelectionRange(start, end);
      }
    }
    commandOutput.innerHTML = "Command: " + data.id + " with select range: " + range;
  }
});

Best practices

Below are some recommendations, tips, and tricks for building commands that are effective and reliable:
#Best practiceGood ExampleBad Example
1Use an action verb as the first word in your command phrase because users are telling the computer to do something”go to assessment section""assessment”
2Don’t make command phrases too long or too complex so that users can remember them and say them in one concise utterance”insert normal exam male""insert physical exam for male with history of diabetes”
3Make sure to not create command phrases with content that also needs to be recognized dictation text”pull latest vitals""vital signs”
4Be mindful of including articles, extraneous terms, and filler words in your command phrases as the command will only be recognized when the defined phrase is matched in full”insert normal exam""insert the normal exam template”
5Use the variables feature to support having alternative methods to execute a single commandSee delete text above
6Use the variables feature to support having a single command support many actionsSee navigation above
7Add timeouts and error handling in the client-side application so that a command does not run indefinitely when there are errors executing the action following command recognition
8Consider which commands should be standardized across all users of the application or opened for custom user configuration
9Test your commands! Dictate command phrases to make sure they are recognized reliably, and ensure command actions can be validated for correct execution
10Get creative! Most applications follow linear flows to click through from one place to another, but speech-enabling the application opens a world of opportunity for supporting more dynamic navigation through the workflow
Use of commands is optional for dictation configuration. They should be configured based on how the integrating application will perform the actions.Please contact us for more information on or help with building commands.