
- Structural decomposition and complete prompts to ensure reproducibility -
- Introduction
- Prompts are “blueprints” rather than “text”
- Three factors that reduce reproducibility
- Design principles to improve reproducibility
- Complete Practical Level Prompts
- The essence of this prompt
- application:How to use prompts correctly
- summary
- Detailed explanation of the above prompts
- ① Bias in learning data
- ② Accuracy difference in token decomposition
- ③ English is the standard language for photography terms
- ■Case where Japanese is suitable
- ●Recommended structure
- ●Example
- ●① Completely in English + internal design in Japanese
- ●② “Intentional translation” from Japanese to English
Introduction
In portrait generation、There is a problem that many people face。
It is the instability that ``a different image appears every time even with the same prompt''。
The problem is not simply because the prompt is short.。
The essence is、Prompts are “unstructured”It is located in。
In this article、Breaking down the prompts in portrait generation into a “production process”、Design methods that ensure reproducibility and、Complete guide to practical prompts。
Prompts are “blueprints” rather than “text”
A typical prompt tends to look like this:。
- beautiful woman
- cinematic
- high quality
But this、In terms of the production process, it is synonymous with "please take the photo in a good way."。
In other words、state of no control at allです。
What's important is、What is to do is to break down the prompt as below。
■Basic structure of portrait prompt
- Subject design (Subject)
- Styling
- Pose/Composition / Composition)
- Lighting
- Environment
- Camera settings (Camera)
- Color/tone
- Style
- Emotion/Meaning (Mood)
- Quality control
This structure is、It matches the production process of the live-action film.。
So what is a prompt?、
Text of shooting direction
です。
Three factors that reduce reproducibility
① Frequent use of abstract words
Words such as “beautiful” and “cinematic” can be interpreted too broadly.、The result changes every time。
② Undefined physical conditions
light position、camera distance、If the focal length etc. are not specified,、The composition is blurry。
③ Ignoring noise factors
Background, hair, facial expressions, etc.、If there are too many parts that the AI interprets freely, it will become unstable.。
Design principles to improve reproducibility
■1. Quantify as much as possible
- Distance (1.5m)
- Angle (45°)
- Color (#code)
■2. Fix swaying elements
- Expression (neutral)
- sight (camera)
- Hairstyle (length/parting)
■3. Prevent deviations with negative prompts
- anime / cartoon exclude
- Eliminate over-correction
- background noise elimination
Complete Practical Level Prompts
Below is、Prompts designed to maximize reproducibility。
ultra-realistic portrait of a 26-year-old Japanese woman,
height 165cm, slim build, narrow shoulders, long neck,
oval face shape, small chin, straight nose bridge, slightly wide-set almond eyes,
dark brown iris (#3b2f2f), clear sclera, natural eyelashes,
light smooth skin with subtle pores, no blemishes, no freckles,
neutral calm expression, lips slightly closed, no smile,
eye contact directly to camera,
hair: medium-length (shoulder length), straight with slight inward curl at ends,
natural black (#1a1a1a), center part, slight volume at crown, no stray hair,
outfit: white silk blouse, matte texture, no patterns, slightly loose fit,
top button open, soft fabric folds, no accessories,
pose: seated upright on a chair, spine straight but relaxed,
hands resting gently on thighs, fingers naturally curved,
shoulders slightly angled (15 degrees to camera),
camera position: eye-level, 1.5 meters distance, centered framing,
framing: chest-up portrait, head near top margin,
lens: 85mm prime lens,
aperture: f/1.8,
depth of field: shallow, sharp focus on eyes, background fully blurred,
lighting setup:
single soft light at 45° camera left, slightly above eye level,
soft shadows on opposite side, subtle reflector fill (10%),
environment:
indoor studio, plain warm gray background (#d6d1cc), no objects,
color grading:
warm tone, low contrast, soft highlights, natural skin tones,
style:
editorial fashion photography, realistic, non-stylized,
negative prompt:
cartoon, anime, bad anatomy, extra fingers, blur, noise,
overexposed, harsh shadows, plastic skin, busy background,
--ar 2:3 --q 2 --style raw --seed 12345
The essence of this prompt
The value of this prompt is not "length"。
What's important is、
- leaving no room for interpretation
- fixed variables
- Completely verbalizes the production process
That's the point。
application:How to use prompts correctly
This complete version is not a "finished form" but a "base"。
For example:
- Change only the lighting
- Just change the hairstyle
- Change only the lens
By replacing things like、
You can design your desired variations.
It will look like this。
summary
Prompt design in portrait generation is、
- rather than adding words
- Control structure design
です。
and most importantly、
Which process should be left to AI?、Where should humans design?
That's the point of view。
AI is good at “generating”、“Design of intent” is not possible。
The person responsible for the design is、This is a prompt。
What you need is not technique、
Breaking down the production process、perspective to reconstructです。
Detailed explanation of the above prompts
This prompt is not just a "detailed description"、Control specifications designed to suppress fluctuations in generated results"is。
In the following、Each blockwhat to fix、How do we control the degree of freedom of AI?of、I will break it down and explain it from a practical perspective.。
■Understanding the overall structure (most important)
This prompt consists of three layers::
① Shape definition (Geometry)
→ 人物の物理的特徴・構図
② Optical definition (Optics)
→ 光・レンズ・色
③ Constraints
→ AIの逸脱防止
👉 この3つを揃えることで、“再現性のある生成”が成立します
■① 被写体設計(Subject Design)
ultra-realistic portrait of a 26-year-old Japanese woman,
height 165cm, slim build, narrow shoulders, long neck,
oval face shape, small chin, straight nose bridge, slightly wide-set almond eyes
●Role
- 人物の「平均化」を防ぐ
- 骨格レベルでのブレを抑制
●Points
- 「26歳」→ 若すぎず老けすぎない中間固定
- 「narrow shoulders / long neck」→ シルエット制御
- 「face shape」→ 顔の輪郭を固定
👉AIは顔より“輪郭”の方がブレやすい
■② 目・肌の詳細定義(微細ディテール制御)
dark brown iris (#3b2f2f), clear sclera, natural eyelashes,
light smooth skin with subtle pores, no blemishes, no freckles
●Role
- 「不気味の谷」回避
- Texture stabilization
●Points
- Specify color code → Improve color reproducibility
- sclera (pewter of eyes) → Prevents cloudiness
- pores → Prevent excessive AI correction
👉“Skin is not specified” and made into plastic
■③ Facial expression/gaze (the most variable element)
neutral calm expression, lips slightly closed, no smile,
eye contact directly to camera
●Role
- Fix emotional fluctuations
●Points
- no smile → Eliminate subtle changes in facial expressions
- eye contact → prevention of line of sight deviation
👉When your gaze shifts, you look like a “different person”
■④ Hair (control of noise generation source)
medium-length, straight, inward curl, center part, no stray hair
●Role
- Suppressing the biggest cause of generation failure
●Points
- Specify all length + shape + parting
- Stray hair elimination → noise reduction
👉Hair is the most unstable part for AI
■⑤ Costume (light reflection control)
white silk blouse, matte texture, no patterns
●Role
- Stabilize the behavior of light
●Points
- silk × matte → prevent excessive reflection
- no patterns → recognition noise reduction
👉Patterns cause AI to misrecognize
■⑥ Pose (skeleton of composition)
seated upright, hands on thighs, shoulders angled 15°
●Role
- Prevention of collapse of human body structure
●Points
- Hands designation → Measures against crooked fingers
- Angle specification → Avoid unnaturalness that is too frontal
👉If you do not specify the move, there is a high probability that it will collapse.
■⑦ Camera position (fixed viewpoint)
eye-level, 1.5m distance, centered framing
●Role
- Stabilizing the perspective
●Points
- Specify distance → Prevent facial distortion
- eye-level → natural impression
👉Distance not specified = wide-angle distortion occurs
■⑧ Lens/depth of field (the core of photography)
85mm, f1.8, shallow depth of field
●Role
- Determining photo-likeness
●Points
- 85mm → portrait standard
- f1.8 → background separation
👉This is the turning point between “CG feeling” and “photo feeling”
■⑨ Writing (most important)
single soft light at 45°, slightly above eye level
●Role
- Creating a three-dimensional effect
●Points
- 45° → Optimum face shadow balance
- From above → Natural light reproduction
- fill 10% → Contrast fine adjustment
👉Not controlling light = everything collapses
■⑩ Environment (noise isolation)
plain warm gray background (#d6d1cc), no objects
●Role
- 背景の暴走防止
●Points
- 単色指定 → 認識安定
- no objects → 不要要素排除
■⑪ カラーグレーディング
warm tone, low contrast, soft highlights
●Role
- 印象の統一
👉 撮影後の「現像工程」に相当
■⑫ スタイル指定
editorial fashion photography, realistic, non-stylized
●Role
- 出力の方向性固定
👉 “photorealistic”より具体性が高い
■⑬ Negative prompt (constraint)
cartoon, anime, bad anatomy, extra fingers...
●Role
- Preventing AI from running out of control
●Points
- Anatomy → Preventing shape loss
- style → manga prevention
👉Often more important than positive
■⑭ Parameter
--ar 2:3 --q 2 --style raw --seed 12345
●Role
- final control
- ar → composition ratio
- seed → core of reproducibility
👉Complete reproduction is impossible without a seed
■The essence of this prompt
This prompt is
- It's not because it's detailed that it's good.
- Stable because “variables are crushed”
The design is。
■Important understanding
This structure remains、
Planning → Concept → Photography → Development
This is consistent with the live-action production.。
■Summary
The essence of this prompt is:
- eliminate ambiguous expressions
- into physical conditions
- Eliminate noise factors in advance
In other words、
Instead of leaving it to AI、Designed to control AI
です。
■Why English is more advantageous
① Bias in learning data
Many generative models、
- Images with English captions
- English-speaking datasets (e.g. LAION)
is being learned in。
👉 つまり、
概念とビジュアルの結びつきが英語で最適化されている
② Accuracy difference in token decomposition
AIは文章をそのまま理解しているのではなく、トークン(意味単位)に分解I am doing。
For English
- “soft light”
- “85mm lens”
👉 意味単位で安定して分解される
For Japanese
- 「柔らかい光」
- 「85mmレンズ」
👉 分解が不安定になりやすい
(文脈依存・曖昧性が高い)
③ English is the standard language for photography terms
For example:
- aperture(絞り)
- depth of field(被写界深度)
- rim light
- cinematic lighting
👉 これらは英語で学習されている前提語彙
■では日本語は使えないのか?
in conclusion:使えるが、用途を分けるべき
■Case where Japanese is suitable
① Concept/Emotion
- 「静かな朝の雰囲気」
- 「孤独感のある表情」
👉 抽象概念は日本語でも機能する
② Rough generation
- Idea generation
- 雰囲気確認
👉 精度より方向性重視
■日本語が不利なケース
① Physical and technical designation (fatal)
- 光の角度
- lens
- 距離
👉 ここは英語でないと崩れる
② When reproducibility is required
👉 同じ画像を出したいなら英語必須
■実務的な最適解(重要)
👉ハイブリッド運用
●Recommended structure
[英語:物理・構造]
+
[Japanese:コンセプト・感情]
●Example
ultra-realistic portrait, 85mm lens, soft lighting, shallow depth of field,静かな朝の空気感、少し内省的な表情、Calm atmosphere
👉 これが最もバランスが良い
■さらに精度を上げる方法
●① Completely in English + internal design in Japanese
- 設計 → 日本語で考える
- 出力 → 英語に変換
👉 実務ではこれが主流
●② “Intentional translation” from Japanese to English
単純翻訳はNG:
❌ "Soft light" → soft light
✔ “Diffused light from the window” → diffused window light
👉Important to translate into photographic terminology
■Common misconceptions
❌"No problem if it's a model that supports Japanese"
→Partially correct but not enough
reason:
- I can understand the language
- butEnglish is dominant when it comes to visual connections
■Conclusion
- It can also be generated in Japanese.
- but
English is superior in terms of accuracy, reproducibility, and controllability.
■Essence
The problem is not the language、
To what extent is the description “controllable”?
です。
However, in the current model、
Its control language is optimized for English
That is the reality。
*Use the prompt above、I tried generating images with Gemin and ChatGPT。(every time、(It may not be the same photo)
ChatGPT has created a face that is commonly seen in AI images.。For Gemini,、It looks natural, like a real person。But、Gemini ignores the request to shake your shoulders 15 degrees.。There are many places where Gemini does not listen to your instructions.。
![]() | ![]() |
| Gemini | ChatGPT |
in Japanese、I just typed in "Have a chignon up hairstyle"。
![]() | ![]() |
| Gemini | ChatGPT |
in Japanese、I just typed in ``A gold-colored slip dress with an open neckline.''。
![]() | ![]() |
| Gemini | ChatGPT |
▶︎[Portrait generation AI prompt design starting in Japanese]
▶︎ [How to write an AI photo prompt | Tips for creating realistic photos]]
▶︎ [How to make AI photos | How to make realistic photos with image generation AI]
▶︎ [How to make AI portrait photos/Professional quality prompt explanation]








