Memo AI Model Downloads

Memo AI offers various local speech-to-text models. You can choose to download from the list below. All models have been tested, and the general situation is as follows:

For English audio with no large sections of silence, background music, or significant noise, the Whisper language model can achieve up to 99% accuracy. Otherwise, it's prone to hallucination problems.
If the machine performance is poor, the likelihood of transcription errors increases.

Large-V1

The Large-V1 model can transcribe over 96 languages, achieving up to 99% accuracy for languages like Spanish and English. However, inference speed will be slower.

Type: High-quality general model
Minimum Memory: 16G
Recommendation: 🌟🌟🌟🌟
Download Link: Large-V1

Large-V2

The Large-V2 model can transcribe over 96 languages, achieving up to 99% accuracy for languages like Spanish and English. Compared to V1, it has enhanced performance and increased training data.

Type: High-quality general model
Minimum Memory: 16G
Recommendation: 🌟🌟🌟🌟🌟
Download Link: Large-V2

Large-V3

The Large-V3 model has added training data compared to the V2 model, including Cantonese recognition. However, its performance is not very stable and it often outputs repetitive content. It is generally not recommended unless you have a good machine and high-quality audio.

Type: High-quality general model
Minimum Memory: 16G
Recommendation: 🌟🌟🌟
Download Link: Large-3

Medium

The Medium model has only a portion of the training data of the Large model. It provides good quality transcriptions for languages like English and Spanish. However, it tends to have higher error rates for Chinese and Japanese.

Type: High-quality general model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟🌟
Download Link: Medium

Distil-large-v3

This model performs excellently in English segmentation, offers fast transcription speed, and good quality. However, it struggles with mixed bilingual content.

Type: High-quality English model
Minimum Memory: 16G
Recommendation: 🌟🌟🌟🌟
Download Link: Distil-large-v3

Medium.en

The Medium.en model has only a portion of the training data of the Large model. Note that this model can only transcribe English.

Type: High-quality English model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟🌟
Download Link: Medium.en

Small

Type: Balanced general model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟🌟
Download Link: Small

Small.en

Type: Balanced English model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟🌟
Download Link: Small.en

Base

Type: Speed general model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟🌟
Download Link: Base

Base.en

Type: Speed English model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟🌟
Download Link: Base.en

Tiny

Type: Speed general model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟
Download Link: Tiny

Tiny.en

Type: Speed English model
Minimum Memory: 8G
Recommendation: 🌟🌟🌟
Download Link: Tiny.en

Chinese and Japanese Exclusive Models

These models are only available to members. Please send your purchase proof to [email protected], and you will receive the corresponding model download link via email.

Model Usage Tutorial

Please go to Memo AI Settings - Model Management - click Import Model in the upper right corner, then go to the homepage and select the model when adding audio. It is usually recommended to proxy the domain https://huggingface.co/ to avoid download issues.

Memo AI Model Downloads ​

Large-V1 ​

Large-V2 ​

Large-V3 ​

Medium ​

Distil-large-v3 ​

Medium.en ​

Small ​

Small.en ​

Base ​

Base.en ​

Tiny ​

Tiny.en ​

Chinese and Japanese Exclusive Models ​

Model Usage Tutorial ​