[25.02] Research Ideas on Anime Aesthetic Models

A quickly put together (and not very complete) summary of current research ideas on anime aesthetics classification:

Model Architecture & Training Strategies

Human labels are not scalable, so develop ways to leverage existing human signals efficiently and transfer them better.

Architecture Comparison:
1. We’re examining how different neural network architectures (e.g., ConvNeXt V2 vs. Vision Transformers) impact the fine-tuning process for aesthetic classification.
  - (model performance difference on small scale data isn’t very significant, but seems to be more obvious on larger-scale trainings)
Architectural Tendencies:
1. Different architectures like Vision Transformers and ConvNeXts have varying focus on textures versus structure, which affects their performance in aesthetic classification.
2. Paper: Architectural Analysis ← also introduced the idea of using synthetic dataset to do albalations
3. (and a few other papers about the topic)
Pre-training Strategies:
1. We’re considering the impact of pre-training on large, general datasets and how it affects fine-tuning on our smaller, aesthetic-specific dataset.
  - Alternative to this is to initialize the model from Danbooru taggers directly and see if it generalizes better vs. General pretrains such as ImageNet / SO400M
2. We’re also analyzing how different pre-training targets (classification, regression, unsupervised) influence results.
  - Eg. Regression targets with tailored transforms may or may not transfer to ordinal classification (human labels) well

Data Quality & Labeling Strategies

Same idea as first part, but doing improvements on the data quality axis.

Data Cleaning Effects:
1. We’re exploring whether removing outliers from the dataset improves model performance or makes it more brittle, especially with a limited dataset.
2. CleanLab: Outlier Detection
3. FiftyOne: Dataset Visualization
Soft Labeling:
1. To address label scarcity, we’re considering using soft labels based on the most probable predictions to augment the dataset.
2. Paper: Soft Labels for Semi-supervised Learning
Semi-supervised Learning:
1. This approach could enhance model accuracy but requires more investment compared to simple soft labeling.
2. Microsoft Semi-supervised Learning

Rater-Specific Modeling

Extending on the idea of data quality, consider training ensembles of models on individual raters, or optimize further by doing stratified ensembles on rater groups.

Individual Classifiers for Raters:
1. Training classifiers tailored to individual raters could improve robustness and potentially help in cleaning up outlier data.
Clustering Raters by Preference:
1. Grouping raters based on preferences and using stratified averaging may provide a better representation of the population.
Explanability / importance of features:
1. Findings from human labelling may give some insights on “what’s generally preferred” for future data collections.
  - Ex. saturation, contrast, subject matter, structural information
  - Ex2. anatomy may or may not be tied to perceived better aesthetics.

Input & Computational Considerations

Finding computationally efficient parameters for more efficient training / higher experiment churn rates

Resolution Effects:
1. We’re analyzing how different input resolutions impact model performance, balancing detail capture with computational cost.
  - Planning to create and train on a synthetic dataset for better controls

MISC

Leveraging vlm / human critique dataset:
- https://github.com/Q-Future/Q-Align
- https://huggingface.co/blog/PandorAI1995/image-analysis-janus
- (There’s also a dataset on this, update later)

LAPIS

Explorer

[25.02] Research Ideas on Anime Aesthetic Models

Model Architecture & Training Strategies

Data Quality & Labeling Strategies

Rater-Specific Modeling

Input & Computational Considerations

MISC

On this page