The PhD Metagame
How to Get Your Paper Accepted
Page 1 Accepts, the Rest Avoids Reject
In 2019, I submitted a paper that was rejected with review scores 2.5, 3, 3. One week later, I resubmitted it with minor changes, and it was accepted with scores 4, 4.5, 4.5.01 For context, thatās an almost unspeakably dramatic jump in scores, from āmiddling rejectā to āstrong accept.ā
This post shows exactly those changes. Weāll frame them in two parts:
- Polish page 1 for acceptance
- Use the remaining pages to avoid rejection
Page 1 has four parts: title, abstract, Figure 1, and introduction. Weāll make them specific, memorable, clear, communicate value, and hook the reader. Reviewers mostly decide accept vs reject by page 1. So we optimize the judgment-before-scroll.
Then, to make sure our paper isnāt rejected, weāll do due diligence in the rest of it by including stuff like baselines, ablations, statistical significance, and human evaluation.
The tweaks that get the paper acceptedāunexpectedly, happilyāalso improve the actual science contribution. But if youāre tempted to be evil, read this footnote.02 The full rejected and accepted submissions are available for download at the end.
Page 1 Is 80% of Your Paper
A paper has five parts:
- Title
- Figure 1
- Abstract
- Introduction
- Rest of the paper
Spend equal time on each of these.
ā Me misquoting03 Jitendra Malik quoting Don Geman
Around 80% of a paperās perceived quality is established on page 1. The title, Figure 1, abstract, and half the introduction are all there. Itās like a bookās cover.
Throughout this post, Iāll show the rejected and accepted versions of the paper I mentioned at the top with the dramatic score swing. Here are both page 1s:


Top: Left: Rejected page 1. Bottom: Right: Accepted page 1.
First, consider page 1ās first impression:04
- Is the Figure 1 colorful and eye-catching?
- Is the title unexpected? Maybe it has one intriguing word?
- Are there any curious terms (bolded or italicized)?
- Is the introduction (hopefully not) full of citations?05
Choose A Specific Memorable Title
Rejected: Visually Grounded Comparative Language Generation ā too general. Any work that uses pictures and generates comparisons could use this title. I picked this title because I thought it argued for the generality of the method. But a too-general title is off-putting because it comes across as over-claiming. And a big part of our method does rely on our domain: we specifically use a biological taxonomy to create our dataset.
Accepted: Neural Naturalist: Generating Fine-grained Image Comparisons ā specific and memorable. In addition to branding (more next), naturalist establishes the domain, and fine-grained narrows the task. Skeptical academics appreciate the clarity of saying what you did. The title is fully unique to our work.06
Maybe Add Branding
I used to dislike branding in papers. It felt presumptuous to claim a proper noun for your research paper and to expect readers to memorize it. And many of the names sound corny.
Now, while I still often feel a pang of annoyance, it is outweighed by the recognition that itās much easier to remember and discuss concepts which have a name. Neural naturalist or Birds-to-Words instead of āour 2019 EMNLP paper about generating comparative image captionsā¦ā
That said, I still dislike throwaway namesāthose with no conceptual link, or which donāt feel earned. I donāt think every paper needs one. But I think it helped for this paper.
Show Screamingly Obvious Value in Figure 1
The main point is that your paperās value should be obvious, not that is must be enormous.


Top: Left: Rejected Figure 1. Bottom: Right: Accepted Figure 1.
A Figure 1 should
- draw readers in
- clearly demonstrate describe both what the work does and its value
- be comprehensible without the caption.
The old Figure 1 showed two separate comparisons, but the link between them wasnāt clear. The bottom row just all look like owls to a non-expert. And the descriptions are long and boring.
The new Figure 1 makes the worksā focus explicit by anchoring with the same left image, and labeling each comparison with a perceptual difficulty (āhighā vs āmediumā). It annotates the operation (āvsā = comparison) and the result (āhighly detailedā vs āfewer detailsā). At this point, the paperās mechanics and unique characteristic has been established: we use different language to compare things based on how similar they look. Finally, to make the long descriptions more approachable and interesting, weāve highlighted two components (features and parts, with orange underlines and green bubbles).
A problem with making Figure 1sāand describing your research in generalāis that you know so much about it, itās impossible to mentally model what itād be like to learn about your work for the first time. Spending time away from your work is extremely helpful here, if possible. I think I benefitted by having the conference review period (a month or two?) away from the paper, so I could come back to it with fresh eyes and rethink how best to illustrate it.
Iāve written about Figure 1s before. Even at the peak of my Figure 1 game, it was normal to make ten drafts before submitting.
End Each Caption with the Takeaway
I think this is the single best paper-writing hack Iāve ever learned.
This Figure 1 is so information-dense nearly the whole caption is the takeaway (yellow). Compare vs the old caption which, has side note (red) taking nearly 1/3 of the (extremely valuable front-page) real estate!


Top: Left: Rejected Fig 1 caption. Bottom: Right: Accepted Fig 1 caption.
A takeaway message explains not what is literally being shown in the figure (that comes first), but what you should think about it.
It might feel strange to do this in scientific writing, because it feels like it crosses the boundary from description into interpretation. But I urge you to do it, especially for less formal fields like computer science because:
-
Youāre saving readers time trying to understand what point you are trying to make by just writing it out.07
-
With good captions, you can understand the whole paper by only looking at the figures. Many (most?) future readers will read your paper this way.
-
The scientific reader has a grain of salt mindset about everything you write anyway, so donāt stress about the āinterpretationā aspect.
If you arenāt trying to prove a point, well, perhaps reconsider that figure.
I got even more brazen about takeaways in future papers, even writing bolded āTakeaway:ā in the caption itself.08
The Abstract: A Specific Valuable Hook
A classic mistake for a certain type of nerd (e.g., me) is to write top-down, going from general concepts to your specific topic. This is tempting because it feels orderly and taxonomic.


Top: Left: Rejected abstract. Bottom: Right: Accepted abstract.
But this turns out terribly, as you can see in the rejected abstract. Itās both boring and feels over-claiming. After top-down framing, and an aside, thereās a ābetrayalā of scope when we reveal our actual task.09
Everything is more specific in the revised abstract: what we study, our contributions (dataset and model), all the way to literal descriptions of specific birds and the task done in human evaluations. Thereās a results teaser, and a hint of a unique hook. Itās not only more specific, itās more fun and compelling to read.
You donāt think your reader wants to have fun and read something compelling? Try reviewing conference papers. Enjoyable writing is like water in a desert. Reviewers wonāt even realize why theyāre happy, theāll just like the paper. Read YOLOv3 and tell me you donāt enjoy it.10
Use Tension/Release Cycles in the Intro
Can you believe weāre still on page 1? Itās that important.
Here weāre discussing specifically the portion of the introduction visible on page 1. Weāre optimizing for what we could call judgment-before-scroll.
My original draft was so bad itās easy to improve. But if I could write something this bad as a 4th year PhD student, others could too.


Top: Rejected page 1 intro. Bottom: Accepted page 1 intro.
My original introduction completely lacks any mention of a problem, and is devoid of tension. It begins with a top-down pile of related work, then side-swipes our own paper with negative implications.
The revised introduction launches straightaway into the problem.
It uses tension/release cycles at multiple resolutions to build up the stakes of the problem and the perceived value of solving it. First, at the paragraph-scale: ¶ 1+2 builds up the problem (tension), ¶ 3 presents our solution (release). Then at sentence-scale, unstable language creates tension: ābut,ā ādifficult,ā āstrain,ā āwhile X, Y,ā āunfortunately.ā
On the backbone of these tension/release cycles, we spend the first two paragraphs setting up our task as being specific, difficult, valuable, and unique. And I really mean each of those adjectives. The final bit of visible text (on page 1) introduces a concrete contribution, our dataset.
I hesitate to recommend a video here because itās both slightly abstract and eighty minutes long, but Larry McEnerneyās talk on Effective Writing is the single best material Iāve seen on thinking about your writing. I saw it way after grad school, but I wish Iād seen it during because I spent a lot of time blindly reverse engineering bits of it (on display in this essay). Some relevant key points:
- All of life before your job, people (teachers) have been paid to read your writing
- Now that theyāre not, your writing must deliver value (which is often entertainment)
- High-value text poses problems with tension-filled language, articulating costs or benefits
I didnāt understand this framing (of problem, tension, value) while writing the revision. But in hindsight, itās shockingly clear how faithfully the improved draft adheres to it.
Use the Rest of the Paper to Avoid All Reasons for Rejection
If weāve done our job, reviewers have now finished reading page 1 and want to accept our paper. Our job now is to let them. How?
Surprise, I have another great two-step process. It uses thinking in reverse:
- Think of all the reasons a reviewer might reject your paper
- Avoid everything in 1.
The more obvious reasons for rejection have to do with completeness: āyou didnāt compare against method X.ā But those are often used as objective crutches to justify a gut decision based on lack of clarity. So we must ensure completeness and also polish up the clarity.
After page 1, the main changes I made are:
- improving all the figures and tables (clarity)
- adding baselines (completeness)
- adding ablations (completeness)
- rewriting the conclusion (clarity)
For reference, other common additions are:
- human evaluations (completeness)
- statistical significance (completeness)
The running text is nearly identical. This is great because someone skimming the paperālooking at only figures, tables, and the conclusionācan enjoy all the improvements.
Make Figures Dense and Beautiful
Thereās this complicated part of the paper called pivot-branch sampling. I was very excited about it but nobody else cared about it. (I think not even my coauthors, though they were too kind to ever say so).
I had the decency to relegate most of pivot-branch sampling to the appendix, but it has to be mentioned a little bit in the body because itās in a dataset paper.
Still, the clarity just wasnāt there. Figure 2 was supposed to help, but it didnāt. In the revision, I added some graphics, which helps quickly get the idea across.


Top: Left: Rejected Figure 2. Bottom: Right: Accepted Figure 2.
In the rejected version, I thought lighter gray text would be nice because thereās a design rule that you shouldnāt use pure black. But it contrasted weirdly with the paperās body text, which has a maddeningly adjacent font and is pure black.
In the accepted version, I went with a sans-serif, black text which helped the figure feel solid and distinct. And more importantly, I used the real estate to illustrate a complicated thing with a natural visual (the pivot-branch sampling).
Go Ahead and Invent a Helpful Taxonomy
The first reviewers were confused about our dataset. Was it interesting or valuable?
I had shot myself in the foot with crappy writing that situated the contribution as incremental and marginally different (see abstract and intro sections above), but thereās no harm in over-correcting, right?
We first introduced this tableānew in the revisionājust to contrast example sentences from the most related datasets. This alone would have been great because examples are densely impactful brain magic.
But one of the biggest brain blasts I had was realizing that I could simply invent helpful axes (circled) along which to compare the datasets.
Dataset comparison table (new in accepted version).
Not only are examples incredibly helpful to get a flavor of things, the taxonomy I made up helps with quantitative (ish) framing.
Inventing the dataset taxonomy helped free up my brain from imaginary rules. For example, the data citations wouldnāt fit in the table without destroying the alignment. What to do? Well, I simply moved them to the caption. Can you do that? Nobody complained.
Sprinkle in Graphics for Variety
A chart helps break up the visual rhythm of a paper. Plus, it can demonstrate a property thatās otherwise hard to grasp. (Here: that we have longer text than other datasets.)


Top: Left: Rejected dataset stats. Bottom: Right: Accepted dataset stats.
Donāt forget, weāre still putting the takeaway message at the end of the caption.
Make Your Contribution Shine
I had done a bad job highlighting how interesting the model was. In the revision, I not only drew out the components we ablated (yellow, red), but I used color to link them to the results table later in the paper. As a bonus, we now have warm colors (yellow, red) for the encoder and cool colors (blue, green) for the decoder.


Top: Rejected model figure. Bottom: Accepted model figure.
I help the reader out by telling them in advance what configuration of the model works best as the takeaway sentence of the caption. This is another good trick to remember: donāt withhold information to surprise readers. They like to know early and often. I am guilty of this and itās still a hard habit to break.
Delete Stuff Around the 2/3 Mark
Several changes above take up more space. Where do we cut?
In an eight-page paper, pages five through seven probably contain good candidates.
Fortunately, we already had a figure with an excessive number of outputs. Iām a big fan of showing your systemās outputs, so Iād included nine scenarios (i.e., eighteen total photos and paragraphs). This is great, but trimming to six scenarios still leaves plenty. Plus it let us be pickier with which ones were included.11
Example outputs. Top row removed in the resubmission.
Notice thereās no takeaway sentence here. Rules are guidelines. If the takeaway feels belabored and out-of-place, omit it.
Add Everything You Might Ask For
This is where the thinking in reverse part comes in at full force. Think of the most common reviewer complaints and avoid them.
The easiest reasons reviewers could give to reject you were:
- lack of baselines
- lack of ablations
- lack of human evaluation
So, add those things.


Top: Rejected results. Bottom: Accepted results.
My favorite part is in the takeaway (yellow), we highlight and explain a weak-looking result (blue).
The baselines and ablations took relatively little work to run and probably improved the actual science contribution (more on that soon).
We already had a dream human evaluation, which is getting people to use the captions for an objective task (i.e., can you pick which animal is which?) rather than scoring them on subjective quality metrics (e.g., how fluent is the text 1ā5?). No changes there.
Go a Little Overboard
Somehow we made space for an enormous table of ablations. Running lots of ablations12 is a luxury of having a small dataset.13
Ablations table (new in the revision.)
You donāt have to go overboard in the ablations. Just maybe somewhere. In future papers, I went overboard in the appendix. Including lots of information (tastefully) shows that you really care and that you did a lot of work.
The Three-Sentence Conclusion
To revise the conclusion, distill the advice from the abstract and introduction. Also, remove all the framing. Weāre left with a concrete, three-sentence highlight reel.


Top: Left: Rejected conclusion. Bottom: Right: Accepted conclusion.
Normal writing advice would say something like this: Write your conclusion using three sentences:
- What do we do?
- Why is it great?
- Why does it matter?
But check out the rejected conclusion. It (roughly) follows this structure too! The real improvement in the revision is specificity.
The Science Thing Was Improved
After making these mostly aesthetic revisions and seeing the paper accepted with dramatically higher scores, the initial thrill inevitably wore off. I grew more cynical of science. While we had improved the framing of our work, I thought, the core science thing we achieved was the sameāthe dataset, the model, the human evaluation, and the overall task framing itself (which is the hardest part).
Now, I believe such seemingly-surface dressings actually strengthen the underlying science thing. Let me try to convince you why.
The primary objects of modern science are research papers. Research papers are acts of communication. Few people will actually download and use our dataset. Nobody will download and use our modelāthey canāt, itās locked inside Googleās proprietary stack.14 But anyone who reads our paper could learn from what we did, and all the revisions to clarity and completeness improve how much they can learn per minute spent reading. And itās not just a pace thing, thereās a threshold of clarity that divides learned nothing from got at least one new idea.
Science is communication.15 Dramatically improving communication improves the science.
Aside: The idea of āmaking a reader want to read moreā has an unexpected link to game development. Youād think thereād be no need for such antics in a scientific research paper, yet dull obtuse prose can scare off readers, obscure the message, and deflate the contributionās impact. Getting readers to the endāat least of page 1āis a necessary goal to optimize for. Just so with game design and āhooks.ā Games employ several hooks to draw players along, which might quickly be lumped into: stories build tension, todo lists beg completion, and ānumber goes up.ā Omitting these entirely robs a game of āstickiness,ā16 leading players to grow bored and stop early. In both papers and games, we must learn to make the object sufficiently engaging so that its consumer is driven to experience the bulk of our creation.
Appendix: Full PDFs
If youād like to check out the original, raw PDFs that we submitted, theyāre available for download here. The appendices (i.e., supplementary material) are nearly identical, but Iāve also included them for completeness.
Footnotes
Review scores spanned 1ā5, with 5 = āconsider for best paper,ā and 3 = āweak accept.ā The conferences were both of equal prestige (ACL and EMNLP respectively). Also, I use āIā for simplicity, but as always, this was work done with coauthors. ā©ļø
Please do good work before optimizing your paper. Iām assuming in this post that you are doing quality research, and you want it to be published to further your career. You need to get past the gatekeeping reviewers. In other words, please use this process for good and not evil. But if you do use it for evil, itās not a big deal either. Another ignored paper will be in a conference instead of just on Arxiv. ā©ļø
I added āFigure 1,ā but I stand by my revision. Thanks to Kenneth Marino and David Freire for finding the source of this quote. Jitendraās talk is greatāI watched it after writing the first draft of this and couldnāt believe how much overlap there was! (I never saw his talk, but someone who went told me about that quote.) Also, aside, donāt get hung up on senior advisors thinking they actually spend as much time working on the title as you do writing the rest of the paper. Yes the title is really, really important, but they donāt. Let them think they do. ā©ļø
Its birdās-eye (ahem) view. ā©ļø
Itās OK if so, but itās a different vibe, and probably harder to pull offāmore in line with an opinion piece. ā©ļø
Having now watched Jitendraās talk (linked in the quote above), he articulates this brilliantly: the title should āevoke the key concept of the paperā and ābe memorable.ā But my favorite part: āthink about it in terms of the conditional entropy;ā your title should only be able to describe your paper and no one elseās (at a conference). ā©ļø
I must point out again that your point will be so obvious to you because itās why you spent hours making the figure, but a new reader may barely spend enough time looking at your thing to understand what the axes are. Help them out. Even stuff like āhigher is betterā is helpful unless completely trivial. ā©ļø
E.g., check this one from Scarecrow (Dou & me et al., 2022)
This is a great example because the tableās interpretation is so complicated that even I (who wrote it) had forgotten what the takeaway was supposed to be a few years later, and would not have easily rediscovered it. ā©ļø
Why does do we feel betrayed? I think because thereās an implicit promise that if youāre talking about something, your paper is going to address it. So if youāre outlining broad swaths of a field, even if in an attempt to just situate your work, it can come across as implying that youāre contributing to this whole grand situation. Thereās a delicate balance to strike. Some context in the intro or related work is often necessary. ā©ļø
As with everything, strike a balance. Engaging writing and very unique hooksāe.g., having the phrases ācitizen scienceā and ābiodiversityā in an NLP paperāmust come as sprinkles on top of a solid contribution that appropriately satisfies the communityās expectations. ā©ļø
I think the other place we saved the most space was in the qualitative analysis. I could probably write eight pages of only qualitative model analysis, so I always end up with too much in the first draft. ā©ļø
The blind bolding of higher numbers without statistical significance tests is truly heinous, I know. I hope somebody has standardized tests that you run on output metrics by now to do this. (Just kidding, Iām sure they havenāt.) ā©ļø
Also, being somewhere like Google. DeepMind wasnāt busy with the TPUs that week so we added a bunch of flags and let them go brrr. But the dataset is so small that by the time Googleās ancient behemoth cluster system had made a dashboard where I could see how the run was going, it had already ran over the whole training dataset (potentially many times, memory is failing me). ā©ļø
Even if it were open source, let me tell you from first-hand experience that getting someoneās research code to run is no small feat, especially under even marginally different conditions. ā©ļø
See Science 1 vs Science 2 in this essay series for more of this argument. ā©ļø
On the other hand, leaning too hard into them and using darker patterns (like gambling mechanics) can cause addiction (and bankruptcy). ā©ļø