voice-note-to-midiConvert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Install via ClawdBot CLI:
clawdbot install danbennettuk/voice-note-to-midiGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://github.com/spotify/basic-pitchUses known external API (expected, informational)
raw.githubusercontent.comAudited Apr 17, 2026 · audit v1.0
Generated Mar 21, 2026
Musicians and producers can quickly convert vocal melodies or humming ideas into MIDI files for use in digital audio workstations (DAWs), enabling easy arrangement, editing, and instrumentation without manual note input. This accelerates the creative workflow, especially for capturing spontaneous ideas during songwriting sessions.
Music teachers and students can use this skill to transcribe singing or humming exercises into MIDI for analysis, feedback, and practice, helping learners visualize pitch accuracy and timing. It supports ear training and composition classes by providing instant notation from audio recordings.
Individuals with disabilities or those who find traditional music notation challenging can hum melodies to generate MIDI files, making music creation more accessible. This can be integrated into assistive devices or apps for expressive communication through music.
Sound designers and composers for games, films, or animations can convert vocal sketches into MIDI to prototype musical themes or sound effects efficiently. This allows for quick iteration and integration with scoring software, enhancing production timelines.
Researchers and ethnomusicologists can transcribe field recordings of vocal performances or traditional music into MIDI for analysis, preservation, and study of melodic structures. This aids in documenting cultural heritage and analyzing musical patterns computationally.
Offer a free basic version with limited features (e.g., lower quantization grids) and charge for premium features like advanced key-aware correction, batch processing, or cloud storage. This model attracts hobbyists while monetizing professional users in music production.
License the skill to companies developing music education software, DAWs, or creative apps, integrating it as a core feature. This provides steady revenue through licensing agreements and expands reach in the edtech and entertainment sectors.
Provide an API for developers to integrate audio-to-MIDI conversion into their own applications, such as mobile apps or web tools, with usage-based pricing. This model taps into the developer community and enables scalable, on-demand access to the technology.
💬 Integration Tip
Ensure FFmpeg is installed for audio format support and test with various input qualities to optimize the ML model's accuracy in real-world scenarios.
Scored Apr 19, 2026
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.