Midjourney
Dataset

Tens of millions of Midjourney prompt and image datasets to training and fine-tune your image generation models!

We offer 20 million (and counting) Midjourney prompt+image datasets for your training needs, categorized by model version, style, composition and operations, we will deliver the dataset through downloadable URLs or physically mail the hard disk drives right to your door!

Empowering your model training & fine-tuning

20 million data entries collected (as of February 2024), and we are adding about 2 million data entries per month!

Data entries sorted by model versions, styles, mediums, compositions, and could further be customized to contain specific keywords in prompts.

Flexible payment plans for one-time dataset purchase or long-term dataset provision. Delivery methods include download URLs or mailed hard drives.

Dataset statistics

A ring-like pie chart with each ring representing Midjourney's model version breakdown, starting from the most inner ring to the most outer ring with Model 4, Model 5, Model 5.1, Model 5.2, Model 6
Model 4
0.46%
Model 5
3.46%
Model 5.1
0.55%
Model 5.2
40.78%
Model 6
54.84%
78.3%
Imagine
15.5%
Upscale
3.4%
Variation
1.6%
Reroll
0.6%
Describe
0.5%
Outpaint
0.2%
Pan

Delivery Methods

Downloadable URLs
  • The client to create a cloud storage
  • Dataset will be uploaded onto client’s cloud storage.
  • Expect long file transfer upload time for large datasets.(Testing can be done to estimate upload time)
Physical Hard Drive
  • Fast delivery time! Pre-loaded hard drives ready for shipping!
  • Worldwide shipping available, let us know your location!
  • Hard Drive cost: $20/TB (6 TB = ~ 1 million imagine entries)

Our Pricing

Unit Pricing
$0.003/entry
  • For downloadable URLs delivery method only. Cost of hard drives for physical hard drive delivery method is not included.
Minimum Purchase Entries
2,000,000
  • One data entry includes: one prompt and its corresponding result(s)

Frequently Asked Questions

Midjourney is an independent research team providing state-of-the-art text-to-image generation model to the public through Discord server where users interact with its Midjourney bot. Users can send a query in natural language (i.e. a "prompt"), then the Midjourney bot will return four high-quality images and offers further options like upscaling or re-generating a variation of the original images.

Our dataset was obtained by scraping Midjourney Gallery. We have no affiliation with Midjourney and we are providing this data to enable tasks such as prompt engineering research, prompt analysis, training text-to-image generative AI models, etc.

We have around 20 million data entries collected from 2023 July until 2024 February; additionally our team is currently adding about 2 million entries per month!

Every entry in our dataset has an unique taskID, which corresponds to a png file (i.e. the image itself) and a JSON file containing the corresponding prompt and other metadata information such as creation time (in unix timestamp).

Yes, please reach us through our official telegram account and we will provide one for you.

Yes, the various ways we can filter includes types of operations, specific keywords, length of prompt texts, model used (ex. V5.2 or V6), style, composition (ex. portrait, close-up, headshot), medium (ex. painting, illustration, photorealistic), aspect ratio, etc.

There are two ways of dataset delivery: we can either provide download URLs for smaller data delivery batches or mail the physical HHD hard disk drives containing the data to you.

Yes, we provide additional services such as auto-annotated datasets using models such as GPT-4V(vision).

© Powered by MidjourneyDataset.

All rights reserved.