Midjourney
Dataset

Tens of millions of Midjourney prompt and image datasets to training and fine-tune your image generation models!

We offer 20 million (and counting) Midjourney prompt+image datasets for your training needs, categorized by model version, style, composition and operations, we will deliver the dataset through downloadable URLs or physically mail the hard disk drives right to your door!

Empowering your model training & fine-tuning

20 million data entries collected (as of February 2024), and we are adding about 2 million data entries per month!

Data entries sorted by model versions, styles, mediums, compositions, and could further be customized to contain specific keywords in prompts.

Flexible payment plans for one-time dataset purchase or long-term dataset provision. Delivery methods include download URLs or mailed hard drives.

Dataset statistics

A ring-like pie chart with each ring representing Midjourney's model version breakdown, starting from the most inner ring to the most outer ring with Model 4, Model 5, Model 5.1, Model 5.2, Model 6
Model 4
0.46%
Model 5
3.46%
Model 5.1
0.55%
Model 5.2
40.78%
Model 6
54.84%
78.3%
Imagine
15.5%
Upscale
3.4%
Variation
1.6%
Reroll
0.6%
Describe
0.5%
Zoom
0.2%
Pan

Delivery Methods

Downloadable URLs
  • The client to create a cloud storage
  • Dataset will be uploaded onto client’s cloud storage.
  • Expect long file transfer upload time for large datasets.(Testing can be done to estimate upload time)
Physical Hard Drive
  • Fast delivery time! Pre-loaded hard drives ready for shipping!
  • Worldwide shipping available, let us know your location!
  • Hard Drive cost: $20/TB (6 TB = ~ 1 million imagine entries)

Our Pricing

Option 1: No Custom Filtering Logic

Unit Pricing
$0.003/entry
Minimum Purchase Entries
2,000,000

Option 2: With Custom Filtering Logic

Unit Pricing
$0.004/entry
Minimum Purchase Entries
1,000,000

Custom Logic Filtering Steps:

  1. We will send to you a file with 2 million metadata texts (including types of operation, model version, prompts, data entry ID, for example see below).
  2. You will write the filtering script yourself to filter for the desired data entries.
  3. You send the working scripts back to us.
  4. We run the script on all existing data entries and will deliver the result.
  5. If you require further data cleaning services (ex.png to webp conversion, splitting images, etc), we’d charge 10% more.

Option 3: Custom Dataset Curation!

Unit Pricing
$0.02/Midjourney result
Minimum Generation Quantity
25,000

Curation Steps:

  1. You send us the custom imagine prompts you want to run in Midjourney
  2. We will run these prompts in Midjourney and return back the picture links of the generated results for you to download!

Delivery and Content:

  1. The unit pricings mentioned above are for downloadable URLs delivery method only. Cost of hard drives for physical hard drive delivery method is not included.
  2. One data entry includes: 1 JSON file and the corresponding image results, for example:

1.One JSON file (which should include the contents below)

a sample JSON file for the data entry, including parameters such as id, timestamp, action, prompt, result, and index

2.Corresponding Results

a sample output picture corresponding to the sample JSON file

Frequently Asked Questions

Midjourney is an independent research team providing state-of-the-art text-to-image generation model to the public through Discord server where users interact with its Midjourney bot. Users can send a query in natural language (i.e. a "prompt"), then the Midjourney bot will return four high-quality images and offers further options like upscaling or re-generating a variation of the original images.

Our dataset was obtained by scraping Midjourney Gallery. We have no affiliation with Midjourney and we are providing this data to enable tasks such as prompt engineering research, prompt analysis, training text-to-image generative AI models, etc.

We have around 20 million data entries collected from 2023 July until 2024 February; additionally our team is currently adding about 2 million entries per month!

Every entry in our dataset has an unique taskID, which corresponds to a png file (i.e. the image itself) and a JSON file containing the corresponding prompt and other metadata information such as creation time (in unix timestamp).

Yes, please reach us through our official telegram account and we will provide one for you.

Yes, the various ways we can filter includes types of operations, specific keywords, length of prompt texts, model used (ex. V5.2 or V6), style, composition (ex. portrait, close-up, headshot), medium (ex. painting, illustration, photorealistic), aspect ratio, etc.

There are two ways of dataset delivery: we can either provide download URLs for smaller data delivery batches or mail the physical HHD hard disk drives containing the data to you.

Yes, we provide additional services such as auto-annotated datasets using models such as GPT-4V(vision).

© Powered by MidjourneyDataset.

All rights reserved.