1. Keep a plan
Congrats, you’re in the middle of the most important step! You have limited time, so you should make sure to spend that time meaningfully – think about what you want to convey and what you need to show for your arguments to work. Write your thoughts down, ideally in a paper draft, set yourself project milestones, and make sure everything you do serves what you want to accomplish. Even if you have cool new idea, ask yourself, “will this improve my paper?” before embarking on a tangent. Keep this plan up to date at all times to make sure you stay on track!
2. Define your niche
We all want better, faster, more general, more reliable results, but what does that mean for your idea? Positioning yourself in the existing research landscape (incl. baselines and existing benchmarks) will help you identify how your experiments need to be built. Use this information to update your plan. Ideally, it helps to summarize your niche through a value proposition that explains how your existing idea differs from the three closest works and how the research niche you are working on differs from other niches in the literature.
3. Find an oracle
In theory, there are many great ideas, but practice tends to kill many of them. Make sure that before you try to scale your idea to state-of-the-art problems and algorithms, you test its merit. To do this, use the smallest & simplest benchmark and metric you defined as meaningful in the last step – now break and fix your approach by playing with the benchmark.
This means:
- identify the weakness you want to eliminate and exaggerate it (e.g., by introducing data bias, using tiny models, a huge amount of noise). Now, you should be able to observe a negative effect.
- Remove this weakness using the idea you’re trying to build - this doesn’t need to be your method yet, but if you argue that, e.g. ensembling can improve robustness to noise, simulate an ensemble in the most handcrafted way you can think of.
Use an approach where you control all variables, even if it’s impractical or uses knowledge of the solution. Your mission is to eliminate all explanations but one: the basic idea of your approach works. This fix can now be your performance oracle that you can strive to reach with your more practical method and harder benchmarks.
4. Record and analyze everything
You induced some sort of negative effect in your baseline, so what is that effect? What does it do? When does it occur? How strong is it in which settings? And how does your oracle react in comparison?
Make sure you record EVERYTHING about all of your experiments so you can shed light on your approach. Have plotting set up for all metrics that could be relevant. Plot them in different ways: ranking, means, with and without outliers. If statistical testing makes sense, implement it at the start and run it from the very beginning. You need to be absolutely sure about what you’re seeing so that going forward, you can base new hypotheses on your previous experiments.
5. Iterate & improve 🚀
Let’s recap: you made a plan, settled yourself into a research niche, showed a flaw in the current state of the art and analyzed why it is there. Now you should be in the best position to achieve great results. Make sure to continuously review your plan, add new observations and make new hypotheses based on your empirical findings. Transfer your oracle to a practical method step by step and scale up your problem settings in turn according to the data you find. This way, you’ll hopefully move in small steps that allow you to diagnose any issues along the way.
In practice, this might look something like this:
You started on an artificially noisy subsection of a benchmark with an oracle, that draws from meta-knowledge about the solution. The signal you got was very strong across all metrics. In a first step, you could try a simple approximation that isn’t very efficient instead of the meta-knowledge, but stay on the small artificial subsection of the benchmark. Your results stay pretty consistent, so you try the full benchmark, still artificially noisy. You observe that now your results are much worse with a very high standard deviation because of a few outliers. You test the oracle on everything - it seems that the approximation is the issue and you need to look into an alternative that has the same performance but is more reliable on the rest of the benchmark.
In this example, at each step there is a clear way forward since every new experiment only varies a limited amount of factors and we collect all relevant insights we need to make new projections.
Remember, this is an iterative process, so it can take a few attempts to succeed. If you take it step by step and remember the overarching plan, though, you’ll always have a knowledge base to fall back on.
Good luck and have fun!